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ABSTRACT 


Expectation  Maximization  (EM)  is  a  general  purpose  algorithm  for  solving 
maximum  likelihood  estimation  problems  in  a  wide  variety  of  situations  best  described  as 
incomplete  data  problems.  The  incompleteness  of  the  data  may  arise  due  to  missing  data, 
truncated  distributions,  etc.  One  such  case  is  a  mixture  model,  where  the  class  association 
of  the  data  is  unknown.  In  these  models,  the  EM  algorithm  is  used  to  estimate  the 
parameters  of  parametric  mixture  distributions  along  with  their  probabilities  of 
occurrence.  In  this  thesis,  the  EM  algorithm  is  employed  to  estimate  different  mixture 
models  for  raw  single  and  multi-band  electro-optical  Infra  Red  (IR)  data.  The  EM  update 
equations  for  single  and  multi-band  Gaussian  and  single-band  Gamma  and  Beta  mixture 
models  are  discussed.  Gaussian  mixture  models  are  used  for  the  raw  image  segmentation 
of  single  and  multi-band  imagery.  Three  different  anomaly  detection  techniques  based  on 
EM-based  image  segmentation  are  discussed  and  evaluated.  The  Gamma  and  Beta 
mixture  models  are  used  to  model  the  detection  statistic  of  two  different  anomaly 
detectors.  An  adaptive  GEAR  (Constant  Ealse  Alarm  Rate)  threshold  selection  based  on 
the  mixture  model  of  the  detection  statistic  has  been  implemented  to  determine  potential 
target  locations.  These  mixture  models  of  detection  statistics  can  also  be  used  for  multi¬ 
sensor  or  multi-algorithm  fusion.  The  algorithms  have  been  evaluated  using  single-band 
mid-wave  IR  airborne  imagery  for  mine  and  mine  field  detection  problems. 
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1.  INTRODUCTION 


1.1.  THE  EM  ALGORITHM 

The  EM  algorithm  is  a  technique  for  maximum  likelihood  estimation  in  situations 
best  described  as  incomplete  data  problems  [1].  It  is  so  called  because  of  its  two 
important  steps — Expectation  (E  step)  and  Maximization  (M  step).  The  EM  algorithm 
seeks  to  iteratively  compute  the  maximum  likelihood  estimates  and  it  is  very  useful  in 
situations  where  algorithms  such  as  Newton-Raphson,  Prediction-Error,  Sliding  Window 
and  Eeast-Squares  turn  out  to  be  tedious  and  time  consuming.  EM  has  specifically  gained 
importance  because  in  certain  incomplete  data  situations,  the  maximum  likelihood 
estimation  can  be  difficult  due  to  the  absence  of  the  data.  If  the  same  problem  is 
converted  to  a  complete  data  problem  with  additional  unknown  parameters,  then  the 
problem  can  be  solved  more  easily  using  EM  iterations. 

Although  these  incomplete  data  problems  can  arise  in  different  situations,  this 
thesis  will  study  the  incomplete  data  problems  as  applied  to  mixture  models.  In 
background  modeling,  the  background  data  can  be  characterized  as  coming  from  a  set  of 
different  probability  distributions.  This  problem  is  an  incomplete  data  problem  in  the 
sense  that  the  class  wise  association  of  the  data  is  unknown.  Also  the  proportions  of 
different  classes  are  not  known.  Thus  in  these  situations  the  EM  algorithm  can  be  applied 
to  distribute  the  data  into  classes  and  to  find  the  class  proportions  and  parameters  of 
distribution  in  a  parametric  mixture  model.  The  EM  algorithm  and  its  concepts  are 
discussed  in  detail  in  Section  2  of  this  thesis. 


1.2.  BACKGROUND  MODELING 

Background  modeling  is  an  efficient  way  to  characterize  the  background  data  with 
certain  probability  distributions.  These  distributions  are  in  the  form  of  mixture  models. 
Because  the  data  sometimes  is  of  high  dimensionality,  non-parametric  methods  of  density 
estimation,  such  as  kernel-based  methods,  would  require  large  amounts  of  training  data. 
This  makes  it  important  for  us  to  study  modeling  using  parameter  estimation.  Also,  in 
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certain  situations  parametric  modeling  makes  the  analysis  more  robust.  This  is  because  if 
the  system  has  been  modeled  using  parametrie  distribution,  then  any  unpredictability  can 
be  aeeounted  for  by  adjusting  the  parameters  of  the  model  based  on  knowledge  from  past 
modeling  experiences  [6]. 

In  many  problems  such  as  minefield  deteetion,  target  reeognition  and  echoloeation 
systems,  background  characterization  is  required  for  a  proper  interpretation  and  analysis 
of  the  data.  For  example  the  EM-based  anomaly  detectors  use  the  mixture  model 
framework,  where  the  concept  of  pixel  membership  is  used  to  detect  anomalies  in  the 
data.  Modeling  of  deteetion  statistic  can  also  be  performed.  The  detection  statistic 
represents  the  non-homogeneities  and  spatial  correlation  in  the  data,  and  therefore  its 
modeling  into  parametric  distributions  is  very  important.  Modeling  of  detection  statistic 
also  helps  in  performing  adaptive  Constant  False  Alarm  Rate  (GEAR)  threshold  seleetion, 
which  is  important  for  practical  detection  systems  and  also  for  sensor  fusion  [7]. 
Modeling  of  the  deteetion  statistic  is  discussed  in  Section  5  of  this  thesis. 

Background  modeling  is  useful  in  image  segmentation  where  the  image  is 
segmented  into  various  regions.  Background  modeling  is  also  used  in  anomaly  detection. 
Here,  the  concept  of  data  membership  is  exploited  to  separate  the  anomalies  that  are 
statistically  different  from  the  background.  These  concepts  are  discussed  in  Sections  4,  5 
and  6  of  this  thesis. 


1.3.  APPLICATIONS  OF  BACKGROUND  MODELING 

1.3.1.  Echoloeation  Systems.  In  the  echoloeation  systems,  radio  and  sound  waves 
are  transmitted,  and  from  the  eeho  of  the  refleeted  waves,  it  is  possible  to  get  information 
such  as  the  location,  direction  and  size  of  the  target.  In  these  systems,  it  is  sometimes 
desirable  and  needed  to  estimate  the  statistical  behavior  of  the  target  and  the  environment 
in  which  it  operates.  There  are  three  main  motivations  for  this  [6]; 

(i)  An  appropriate  setting  of  the  deteetion  threshold  of  an  eeholocation  system  is 
required  to  eontrol  the  false  alarm  rate.  The  reason  for  this  is  that  in  many 
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active  SONAR  and  RADAR  systems,  the  deeision  to  deelare  the  presenee  of  a 
target  depends  on  the  amplitude  of  the  matched  fdter  output  exceeding  a 
eertain  threshold.  The  estimate  of  the  background  probability  density  function 
of  the  matched  filter  amplitude  is  required  if  the  threshold  is  to  be  determined 
by  parametrie  signal  proeessing  teehniques  [7]. 

(ii)  It  is  advantageous  if  a  system  deteetion  performanee  can  be  predicted  by 
means  of  measurement  of  the  historieal  data.  For  example,  if  the  baekground 
is  eharaeterized  using  parametrie  distributions,  then  in  the  ease  of  unfavorable 
eireumstanees,  the  parameters  (sueh  as  frequeney  and  wavelength)  ean  be 
adjusted  to  eombat  those  unfavorable  eireumstanees. 

(iii)  Knowledge  of  the  baekground  statisties  may  be  used  to  improve  the  design  of 
the  deteetor  in  an  optimal  sense. 

1.3.2.  Automatic  Target  Recognition  (ATR)  Systems.  ATR  systems  generally 
have  a  multistage  architeeture  for  target  detection  or  recognition.  The  first  stage  is 
generally  a  pre-sereener.  It  seleets  the  targets  at  a  given  CFAR  that  are  passed  on  to  the 
next  stage,  which  is  the  diserimination  stage.  The  discrimination  stage  is  basically  a  false 
alarm  mitigation  seheme  that  rejeets  the  false  alarms  based  on  eertain  features  [34],  [35]. 
Finally,  the  elassifieation  stage  is  used  to  deteet  the  targets.  The  need  for  modeling  comes 
in  the  pre-sereener  stage  where  an  effieient  modeling  can  be  used  for  adaptive  CFAR 
threshold  seleetion.  This  helps  in  the  seleetion  of  the  optimum  number  of  targets  that  are 
then  passed  to  the  next  stage. 

Also  in  these  systems,  understanding  of  the  battlefield  would  be  greatly  enhaneed 
if  the  surveillanee  systems  used  could  also  provide  automated,  reliable  elassifieation  of 
objects  in  the  areas  surveyed.  The  volume  of  the  data  produeed  by  surveillanee  makes  it 
infeasible  for  the  human  interpretation  of  the  data.  The  RADAR  Range  Profiles  (RRPs) 
data  are  used  in  the  ATR  systems.  Sinee  the  data  are  of  a  high  dimensionality,  non- 
parametrie  methods  of  density  estimation,  sueh  as  kernel-based  methods,  would  require  a 
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large  amount  of  training  data.  Approximating  the  probability  density  by  a  Gaussian 
distribution  alone  may  make  the  analysis  rigid. 

In  these  eases,  mixture  models  provide  a  parametrie  method  that  is  flexible  for  the 
modeling  and  analysis,  as  this  would  help  in  modeling  a  wide  range  of  possible  densities, 
ineluding  multi-modal  densities.  In  ease  of  the  RADAR  data,  there  is  a  strong  motivation 
for  the  mixture  model  to  unseramble  returns  from  multiple  targets  [6].  In  order  to  form 
the  mixture  models  using  parametrie  distributions,  an  effieient  algorithm  is  needed.  The 
EM  algorithm  is  one  sueh  algorithm  that  effieiently  models  the  baekground  data  with  the 
probability  models. 


1,4,  MINEFIELD  DETECTION 

In  minefield  deteetion,  an  effieient  elassifieation  and  eharaeterization  of  the 
baekground  is  to  be  done.  The  aim  is  to  separate  the  mines  (anomalies)  statistieally  from 
the  baekground.  Following  are  the  motivations  for  baekground  analysis: 

(i)  The  deteetion  performanee  of  any  system  depends  on  the  baekground 
eharaeteristies  and  terrain.  A  basie  knowledge  of  an  anomaly  deteetor  sueh  as 
an  RX  algorithm  shows  that  the  performanee  over  different  types  of 
baekgrounds  depends  on  the  eorrelation  and  non-homogeneities  in  the 
baekground.  Thus  in  order  to  quantify  the  deteetor  performanee,  it  is 
neeessary  to  model  the  deteetion  statisties  of  the  anomaly  deteetor  into 
probabilistie  models  that  would  help  improve  the  performanee  of  the  anomaly 
deteetor. 

(ii)  Modeling  helps  to  statistieally  differentiate  the  baekground  and  therefore 
helps  in  forming  guidelines  that  would  help  in  studying  the  deteetability  and 
likelihood  of  mines  in  different  types  of  baekgrounds  and  terrains. 
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(iii)  Modeling  also  helps  in  extensive  analysis  of  threshold  determination  for  a 
given  CFAR.  This  threshold  is  used  to  separate  the  baekground  from  the 
targets. 

Automatic/ Aided  Target  Recognition  (ATR/AiTR)  systems  are  often  airborne  in 
nature.  Collection  of  data  from  a  large  area  mandates  the  development  of  these  systems. 
These  systems  have  a  pre-screening  stage  that  is  required  to  detect  potential  target 
locations.  Background  modeling  of  the  detection  statistic  is  very  useful  in  the  pre¬ 
screening  stage,  where  an  adaptive  CFAR  threshold  selection  is  performed  to  select  a 
threshold  based  on  detection  statistics.  The  detection  statistic  represents  non¬ 
homogeneities  and  spatial  correlation  of  the  data.  Therefore,  the  adaptive  threshold 
selection  based  on  detection  statistic  helps  in  selecting  a  threshold  that  is  invariant  to  the 
background  the  detector  is  operating  in. 

Thus,  it  is  evident  that  minefield  detection  forms  an  important  application  of 
background  modeling.  This  application  is  explored  in  greater  details  in  this  thesis. 


1.5.  OVERVIEW  OF  THE  THESIS 

In  Section  2,  the  EM  algorithm  is  introduced.  The  concept  and  mechanism  of  the 
EM  algorithm  is  studied  in  detail  as  the  EM  algorithm  is  the  backbone  of  the  mixture 
model  architecture.  The  development  and  the  mathematical  formulation  of  the  algorithm 
is  presented. 

In  Section  3,  the  concepts  of  the  Image  Segmentation  using  the  EM  algorithm  are 
presented,  where  the  image  is  segmented  into  regions  that  are  statistically  different.  The 
region  belonging  to  a  particular  class  appears  as  a  separate  segment  in  the  image. 

Section  4  covers  the  various  anomaly  detectors  that  have  been  implemented  based 
on  mixture  modeling.  The  mechanism  and  concept  of  the  different  type  of  EM-based 
anomaly  detectors  is  presented. 
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Section  5  introduces  various  mixture  models  that  have  been  implemented  to  model 
the  detection  statistic.  In  order  to  test  the  modeling  performance  of  these  mixture  models, 
the  Chi-Square  test  is  described  that  evaluates  the  modeling  performance  of  a  given 
distribution.  The  two  parameter  Beta  and  Gamma  mixture  models  are  tested  on  two 
different  statistics  for  extensive  airborne  data.  The  performance  of  the  mixture  models  is 
then  compared. 

Section  6  discusses  the  automatic  CFAR  threshold  selection  from  the  modeling 
results.  These  thresholds  are  calculated  for  a  given  CFAR  value.  These  thresholds  that  are 
determined  from  the  modeling  results  are  used  to  determine  the  targets  that  are  candidates 
for  further  processing. 

Appendix — A  lists  various  mixture  models  implemented  along  with  their 
distributions  and  the  update  equations.  It  also  shows  the  use  of  the  update  equations  to 
estimate  the  parameters  of  the  mixture  models. 

Appendix — B  discusses  different  tests  that  evaluate  the  goodness  of  fit  of  the 
mixture  models  implemented. 
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2.  THE  EM  ALGORITHM 

2,1.  EM— AN  INTRODUCTION 

The  EM  algorithm  is  a  general  purpose  algorithm  for  maximum  likelihood 
estimation  in  a  wide  variety  of  situations  best  deseribed  as  ineomplete  data  problems.  The 
ineomplete  data  problems  arise,  for  example,  where  there  are  missing  data,  truncated 
distributions,  censored  or  grouped  distributions  and  also  in  situations  where  the  missing 
data  are  not  evident.  One  such  case  is  a  mixture  model,  where  the  class  association  of  the 
data  is  unknown.  The  data  is  assumed  to  belong  to  a  parametric  mixture  model  but  the 
proportion  of  each  class  is  unknown. 


It  is  to  be  noted  that  some  problems  at  first  sight  may  not  seem  to  be  incomplete 
data  problems  but  they  actually  are.  Direct  solution  of  these  problems  can  be  tedious  and 
unstable.  There  can  be  a  great  reduction  in  computation  if  the  problem  could  be 
converted  to  a  complete  data  problem  because  in  these  situations,  the  complexity  of  the 
Maximum  Likelihood  (ML)  estimation  reduces  if  the  complete  data  is  provided  because 
the  log-likelihood  or  the  cost  function  for  complete  data  problem  is  often  of  a  nice  and 
tractable  form.  However,  in  some  cases  the  ML  equations  do  not  have  explicit  solutions, 
and  therefore  one  has  to  resort  to  some  iterative  methods  to  arrive  at  the  solution.  EM  is 
one  such  efficient  iterative  technique.  The  conversion  of  incomplete  data  problem  into  a 
complete  data  problem  is  discussed  more  in  Section  2.5. 

2,2,  MOTIVATION  FOR  EM 

Many  attempts  have  been  made  to  estimate  the  parameters  using  the  training  data. 
In  literature,  this  has  been  referred  to  as  a  supervised  approach.  One  major  drawback  of 
this  type  of  method  is  that  it  is  unrealistic  because  many  times  the  training  data  with 
reliable  class  association  is  unavailable.  Therefore,  attempts  have  been  made  to  estimate 
the  parameters  using  some  unsupervised  technique,  i.e.  the  one  that  does  not  require  the 
training  data.  EM  is  one  such  technique.  Statisticians  have  used  EM  to  estimate 
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parameters  for  the  ineomplete  data  problem  beeause  EM  works  very  well  for  this  type  of 
praetieal  problems.  In  reeent  times,  there  has  been  considerable  interest  in  stochastic 
model-based  image  segmentation.  Here  the  image  is  separated  into  a  set  of  disjointed 
regions,  and  each  region  is  associated  with  one  of  a  finite  number  of  classes.  Each  class  is 
assumed  to  have  been  modeled  as  a  random  field.  Because  these  random  fields  are  often 
parametric  models,  an  important  problem  that  one  is  faced  with  is  regarding  parameter 
estimation.  Clearly,  the  parameter  estimation  problem  here  is  an  incomplete  data 
problem,  because  the  observed  image  is  a  mixture  of  several  data  classes  with  the  class 
status  of  each  pixel  unknown,  which  means  that  the  correct  segmentation  is  not  known. 


2,3.  EM— A  BRIEF  HISTORY 

The  name  ‘EM’  was  coined  by  Dempster,  Eaird  and  Rubin  in  a  paper  in  1976  and 
was  published  in  the  Journal  of  Royal  Statistical  Society  in  1977  [48].  Because  the  idea 
behind  the  EM  algorithm  is  very  general,  algorithms  like  it  were  formulated  and  applied 
in  a  variety  of  problems  even  before  the  paper  was  presented.  However,  it  was  in  the 
paper  presented  by  Dempster,  Eaird  and  Rubin  that  various  ideas  were  synthesized  and  a 
general  theory  was  developed.  The  various  references  to  literature  on  an  EM-type  of 
algorithm  can  be  found  in  [1]. 


2,4,  EM— THE  CONCEPT 

The  EM  algorithm  estimates  the  parameters  of  the  mixture  model  iteratively, 
starting  from  some  initial  guess.  Each  iteration  consists  of  the  following  two  steps: 


Step  1:  (Expectation):  This  step  tries  to  find  the  distribution  of  the  complete  data  given 
the  known  values  of  the  observed  data  and  a  current  estimate  of  the  parameters.  The 
estimation  step  basically  involves  the  formulation  of  a  ‘g’  function  (to  be  described 
later),  which  is  basically  the  estimation  of  the  likelihood  function  of  the  complete  data 
given  the  observed  data  and  the  current  fit  of  parameters  (i.e.  the  parameters  obtained  in 
the  maximization  step  of  the  previous  iteration). 


9 


Step  2:  (Maximization):  This  step  re-estimates  the  parameters  to  be  those  with  maximum 
likelihood  under  the  assumption  that  the  distribution  found  in  the  estimation  step  is 
correct.  The  maximization  step  is  so  called  because  it  maximizes  the  'Q'  function 
formulated  in  the  estimation  step  to  obtain  a  new  set  (fit)  of  parameters. 

Each  step  is  carried  out  in  the  above  order  until  the  terminating  condition  is 
reached.  The  terminating  condition  can  be  assumed  to  have  been  reached  when  the  log- 
likelihood  function  of  the  complete  data  does  not  show  significant  improvement  over  its 
previous  value.  It  can  be  shown  that  each  successive  iteration  either  improves  the  true 
likelihood  or  leaves  it  unchanged  (if  a  local  maximum  has  already  been  reached)  [1]. 


2,5.  MATHEMATICAL  FORMULATION  OF  EM 

Let  us  suppose  that  for  any  practical  situation: 

X  =  Complete  Data  (observed  plus  unobserved), 
y  =  Observed  (incomplete)  data, 

z  =  Additional  data  which  is  missing  (or  is  unobservable). 

Also  f{y,(p)  is  the  probability  density  function  (pdf)  of  the  observed  incomplete 
data,  where  ‘  ^  ’  is  the  set  of  parameters  that  characterize  the  pdf. 

The  problem  is  to  estimate  ‘  (p  ’  based  on  the  incomplete  information  represented 
by  the  observed  data,  ‘  y  The  likelihood  function,  £((p  |  y),  for  the  parameter,  ‘  (p  ’  given 
‘  y  ’  can  be  formed  as: 

£{<p\y)  =  (2.1) 

Log-likelihood  function,  ln[£(^  |  y)]  can  be  formed  where  'In'  is  the  natural  logarithm. 


ln[£(^  I  y)]  =  ln[/(y,^)] . 


(2.2) 
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A  log-likelihood  function  is  considered  because  it  makes  the  analysis  and 
calculations  easier  without  changing  the  optimization  problem  at  ah. 

The  maximum  likelihood  estimation  problem  is  complicated  by  the  fact  that  only 
the  incomplete  data  is  at  hand.  Thus  in  order  to  ease  the  problem,  the  incomplete  log- 
likelihood  function  is  converted  to  the  complete  log-likelihood  function  by  adding  the 
missing  information,  ‘  z  .’  The  log-likelihood  of  the  complete  data,  {y,z}=  {x},  is  defined 
as: 


InK  if  I  T’  ^)]  =  {(p  I  x)]  =  ln[/(x  |  (p)]  ,  (2.3) 

where  £X^\^)  is  the  likelihood  function  of  the  complete  data. 

Therefore  the  EM  algorithm  approaches  the  problem  of  solving  the  incomplete  data 
likelihood  function  indirectly  by  proceeding  iteratively  in  terms  of  the  complete  data  log- 
likelihood  function,  |  x).  As  the  complete  data  is  unobservable,  it  is  replaced  by  its 
conditional  expectation  given  the  observed  data,  and  the  current  fit  of  parameters,  i.e 


ln[/(x  I  (p)]  =  ln[/(y  |  (p).f[z  \  y,  (p)] .  (2.4) 

This  step  is  discussed  in  more  detail  in  specific  context  of  Gaussian  mixture  model 
for  image  segmentation  in  section  3.4. 

Starting  from  some  initial  guess  of  the  parameters,  ‘  ^3°,’  the  EM  algorithm  iterates 
between  the  two  steps,  known  as  the  ‘E’  and  ‘M’  steps.  These  are  described  as  follows: 

E  (Estimation)  Step:  This  step  forms  the  ‘  Q  ’  function  by  estimating  the  log-likelihood  of 
the  complete  data  given  the  observed  data  and  current  fit  of  parameters.  The  current  fit  of 
parameters  is  the  set  of  parameters  that  is  obtained  from  the  maximization  step  of  the 
previous  iteration. 
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The  ‘  Q  ’  function  is  formulated  as: 


q{(P,  (p’^ )  =  [ln{4  {(p\x)]  I  y,(p^  ]  ,  (2.5) 

where  ‘  cp^  ’  represents  the  current  set  of  parameters  and  ‘  ’  represents  the  expectation 

of  the  log-likelihood  of  the  complete  function  over  ‘ 


M  (Maximization)  Step:  This  step  maximizes  the  ‘  Q  ’  function  in  Equation  (2.5),  to  get  a 
new  set  of  parameters,  so  that. 


E\ 


d(p 


=  0. 


(2.6) 


The  new  value  of  the  parameter  ‘  ^  ’  is  given  by: 


?>“=?>*+ (4  )'■£ 


s\q(v,v'‘] 


d(p 


(2.7) 


where  ‘  ’  is  the  information  matrix,  calculated  at  ‘  Information  matrix  for  different 

mixture  models  is  discussed  in  more  detail  in  appendix-A. 

Once  the  parameters,  ''  cp'" are  obtained  in  the  iteration,  the  ‘E’  step  is  carried 
out  again,  and  the  ‘  g  ’  function  is  updated  with  these  new  parameters  to  form  the  new 
estimate  of  the  log-  likelihood  of  the  complete  data.  The  ‘M’  step  is  then  carried  again  for 
the  new  updated  ‘  Q  ’  function  to  yield  a  more  updated  set  of  parameters,  ‘  ^3^^' ,’  and  the 
process  is  carried  on  until  the  difference,  |  x)-f^(99*  |  x),  changes  by  an 

arbitrarily  small  amount.  At  this  point,  the  most  likely  set  of  parameters  is  reached.  This 
condition  is  often  used  as  the  terminating  condition. 
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2,6.  A  VARIANT  OF  EM— STOCHASTIC  EM 

One  major  weakness  of  EM  is  its  vulnerability  to  the  initial  values,  ‘  ^3°.’  If  these 
are  “far”  from  the  aetual  values,  then  there  may  be  eases  when  the  algorithm  does  not 
eonverge  to  the  aetual  value.  This  happens  because  the  EM  algorithm  sometimes  gets 
trapped  into  local  maxima,  (if  it  finds  one)  before  the  actual  global  maxima.  These  are 
known  as  saddle  points.  If  this  happens,  then  the  parameter  values  might  get  stuck  to 
these  saddle  points,  and  therefore  the  parameters  may  be  incorrectly  estimated. 


The  weakness  is  partially  removed  with  SEM  that  resembles  EM.  In  case  of 
standard  EM,  the  indicator  variables  are  obtained  using  maximum  likelihood  and  the 
class  probabilities,  ‘  In  case  of  SEM  the  value  of  these  indicator  variables  is  obtained 
based  on  a  draw  using  the  current  class  probabilities,  ‘  ^3^  .’  This  assigns  each  sample  to 
one  of  the  classes  probabilistically  in  proportion  to  the  class  probabilities,  ‘  ^3^  .’  Because 
of  the  random  nature  of  SEM,  it  does  not  remain  confined  to  a  local  maxima  and  instead 
is  more  likely  to  progress  towards  global  maxima. 

In  image  modeling,  another  form  of  EM,  known  as  SEM-EM,  is  often  used.  This  is 
so  called  because  SEM  is  used  to  run  for  the  initial  major  part  of  the  iterations  (say  75%). 
After  this  initial  “warm  up”,  EM  takes  over.  It  is  assumed  that  after  using  SEM  for  75% 
of  the  iterations,  the  results  are  close  to  the  actual  values.  The  EM  then  runs  until  the  end, 
converging  to  global  maxima.  The  set  of  parameter  values  obtained  in  the  last  certain 
number  of  iterations  (for  example  100)  are  averaged  to  give  the  values  of  final  converged 
parameters. 

2.7.  CONVERGENCE  PROPERTIES  OF  EM 

As  seen  in  the  previous  section,  it  is  possible  for  the  parameters  to  converge  to  a 
saddle  point  rather  than  a  global  point.  This  depends  a  lot  on  the  type  of  log-likelihood 
function.  If  the  log-likelihood  function  is  uni-modal,  then  the  convergence  of  the 
likelihood  function  and  the  parameters  is  unique.  If  the  likelihood  function  is  not  uni- 
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modal,  then  in  that  case  the  likelihood  function  and  the  parameters  might  converge  to 
some  saddle  point. 


The  situation  can  be  analyzed  by  using  the  analogy  of  a  round  bowl.  The  likelihood 
surface  can  be  thought  to  be  a  bowl.  The  bowl  can  have  a  steep  or  flat  bottom.  If  the  bowl 
has  a  steep  bottom,  then  any  round  ball  that  is  put  at  a  certain  position  inside  the  bowl 
finally  reaches  the  steep  bottom  after  certain  time  and  remains  there.  However,  if  the 
bottom  is  fiat,  then  the  ball  might  reach  the  fiat  bottom  and  circle  around  the  flat  bottom 
surface  rather  than  reaching  a  particular  point  at  the  bottom. 

Similarly,  it  has  been  observed  in  some  cases  that  if  the  number  of  parameters  in  a 
distribution  is  large,  then  the  parameters  tend  to  undergo  periodic  oscillations  after  a 
certain  number  of  iterations.  This  happens  because  the  likelihood  surface  tends  to  become 
flat  as  the  number  of  parameters  in  a  distribution  is  large.  Thus  rather  than  converging  to 
a  steady  value,  the  parameters  of  such  type  of  distribution  tend  to  converge  to  a  range. 
When  this  happens,  EM  is  said  to  converge  to  a  circle  rather  than  a  single  point  [1]. 


2,8.  SELECTION  OF  NUMBER  OF  CLASSES 


Choosing  the  number  of  classes  is  an  important  issue  in  the  EM-based  applications. 
However  no  fully  general  and  satisfactory  solution  seems  available.  The  most  common 
approach  of  selecting  the  number  of  classes  is  based  on  the  log-likelihood  of  the  samples 
given  the  number  of  classes  [41].  Eet  ‘  x  ’  be  the  data,  ‘g’  be  the  number  of  classes  and 

A 

‘  ^  ’  be  the  parameters  of  the  mixture  model  that  are  estimated  from  the  data.  Then  a 
criteria  based  on  log-likelihood  has  been  given  in  [42]  using  the  Bayesian  Information 
Criteria  (BIC)  approximation.  This  is  given  by; 


BIC  =  2  log  p 


A 


x\g,(P 


J 


-  N  log(n) 


(2.8) 


Here,  N ^  is  the  number  of  parameters  estimated  from  the  data,  and  n  is  the  number  of 
samples. 
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The  larger  the  value  of  BIC,  the  better  the  model  is  aeeording  to  this  eriterion. 
However  in  order  to  ineorporate  this  eriterion,  one  has  to  run  the  algorithm  for  eertain 
number  of  times  (eaeh  time  with  a  different  number  of  elass)  and  then  has  to  seleet  the 
number  of  elasses  that  gives  the  maximum  value  of  BIC.  This  is  time  eonsuming  and 
therefore  the  number  of  elasses  is  generally  pre-speeified.  The  BIC  eriterion  for  seleeting 
the  number  of  elasses  is  diseussed  in  detail  in  [42], 

2.9.  APPLICATIONS  OF  EM 

The  EM  algorithm  has  seen  wide  applieations  in  different  fields.  Some  of  the 
applieations  where  the  EM  algorithm  and  its  variants  are  widely  used  are  the  following; 


2.9.1.  Background  Modeling.  The  applieations  of  the  EM  algorithm  in  ATR 
systems,  eeholoeation  systems  and  minefield  deteetion  have  already  been  diseussed  in 
Seetion  1.  In  these  systems,  the  EM  algorithm  helps  to  eharaeterize  the  baekground, 
whieh  then  helps  to  eonduet  different  types  of  analysis.  Modeling  of  deteetion  statisties 
gives  an  insight  to  the  nature  of  the  baekground.  The  eoneepts  and  applieations  of 
baekground  modeling  and  modeling  of  deteetion  statisties  are  diseussed  in  Seetions  3,  4, 
5  and  6  of  this  thesis. 

2.9.2.  Speech  Recognition.  The  Automatie  Speeeh  Reeognition  (ASR)  systems 
have  been  making  signifieant  progress,  but  an  important  eonsideration  is  in  its  robustness 
to  different  speaking  styles  and  environments.  ASR  systems  trained  in  one  environment 
perform  poorly  in  the  other  environments  due  to  a  mismateh  between  testing  and  training 
eonditions.  These  mismatehes  are  eaused  by  different  speaking  styles,  the  presenee  of 
noise  and  insuffieient  eharaeterization  of  the  speeeh  signal  [22]. 

To  reduee  this  mismateh,  mappings  and  transformations  are  done  between  the  test 
utteranees  and  original  models.  Eet  ‘  A^  ’  be  a  set  of  trained  Hidden  Markov  Models 
(HMMs),  where  the  subseript  ‘  x  ’  denotes  that  the  models  are  trained  on  given  set  of 
training  data,  {X].  Eet  the  test  utteranee,  ‘T,’  be  given  as:  Y  =  j, . y^\.  The 
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problem  is  to  recognize  the  word  sequence,  ‘IF,’  from  ‘7,’  where  ‘IF  is  given  by 
W  =  {wi,W2, . w^}. 

Any  mismatch  between  ‘Jf  and  ‘F  results  in  error  in  recognizing  the  word 
sequence,  ‘IF.’  In  order  to  reduce  this  mismatch,  ‘F  is  mapped  to  ‘A’  using  the 
transformation  function,  ‘  ’  such  that. 


X  =  F^{y).  (2.9) 

Here,  ‘  v  ’  are  the  parameters  of  the  transformation  function,  ‘  .  ’  The  mismatch 

can  be  reduced  by  finding  the  parameters  ‘v’  such  that  the  likelihood,  />(T|v,A^)  is 
maximized.  Therefore  ‘  v  ’  can  be  obtained  as: 

v' =  argmax/»(T  I  v,A^).  (2.10) 


Thus  ‘v’  can  be  obtained  using  the  EM  algorithm,  and  the  transformation  function, 
‘  F,,’  can  be  obtained.  This  decreases  the  distortion  caused  by  the  mismatch.  A  discussion 
on  this  approach  can  be  found  in  [43]  and  [44]. 

2,9.3.  Medical  Imaging,  Magnetic  Resonance  (MR)  imaging  is  an  advanced 
medical  imaging  technique  providing  rich  information  about  the  human  soft  tissue 
anatomy.  For  brain  MR  images,  the  only  method  developed  to  date  for  statistical 
segmentation  of  brain  MR  images  is  based  on  the  finite  mixture  (FM)  model,  in  particular 
the  finite  Gaussian  mixture  (FGM)  model  where  the  Gaussian  distribution  is  assumed 
because  of  its  simple  mathematical  form  and  the  piecewise  constant  nature  of  ideal  brain 
MR  images  [40].  The  mixture  model  consists  of  different  tissue  classes,  especially  gray 
matter  (GM),  white  matter  (WM)  and  cerebrospinal  fluid  (CSF).  The  EM  algorithm  is 
used  to  estimate  the  parameters  of  the  mixture  model,  and  assign  class  labels  to  all  the 
pixels.  It  then  segments  the  image  by  representing  all  samples  that  belong  to  a  particular 
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class  as  a  distinct  region  in  the  image.  The  discussion  on  EM  model  and  image 
segmentation  in  MR  imaging  ean  be  found  in  [21], 

2,9.4,  Dairy  Science,  Quantitative  variation  in  traits  that  ehange  with  age  is 
important  to  animal  breeders.  Often  biologieal  traits  sueh  as  body  size,  weight  or  growth 
are  measured  at  various  times  or  ages.  These  traits  are  highly  correlated.  A  (co)variance 
function  is  a  continuous  function  that  represents  the  variance  and  covariance  of  sueh 
traits  measured  at  different  points  of  time.  The  (co)varianee  function  is  useful  when 
spatial  or  temporal  data  are  modeled  and  can  be  linked  to  time-dependent  phenomena 
such  as  growth  of  lactation.  Using  these  (co)variance  funetions,  information  sueh  as  milk- 
yield  etc.  can  be  predicted  for  different  points. 

Let  ‘X’  denote  the  covariance  matrix  caleulated  at  different  times  and  ‘  (p  ’ 

represent  the  matrix  of  polynomial  functions  evaluated  at  these  times  then  the  observed 
eovarianee  matrix  ean  be  given  by: 

l^-(p.K.(p\  (2.11) 

where  denotes  the  transpose  of  Here  ‘A’  denotes  the  coefficients  of  the 
(eo)variance  funetion.  ‘A’  can  be  estimated  as: 

K  =  (p-^Y}fp-^)\  (2.12) 


0 

Once  ‘A’  is  obtained,  it  is  used  to  get  updated  covariance  matrix,  ^  •  This  covariance 
matrix  is  then  used  to  give  a  better  estimate  of  ‘A.’  Thus  the  EM  algorithm  is  used  to 
estimate  the  (co)variance  function  coefficients  for  different  measurements  such  as  milk, 
fat  test-day  yield  ete.  Estimation  of  the  (eo)varianee  funetion  using  EM  has  been 
discussed  in  [20],  [38]  and  [39]. 
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2,9.5,  AIDS  Epidemiology.  The  forecasting  of  future  AIDS  incidences  using  the 
estimates  of  past  and  present  HIV  incidences  is  very  important  for  public  health  planning. 
Data  relating  to  HIV  infection  is  incomplete  due  to  missing  information  and  delayed 
reporting  etc.  Also  there  is  an  unknown  time  interval  between  the  infection  and  diagnosis. 
All  these  aspects  lead  to  the  incompleteness  of  the  problem  and  therefore  in  order  to 
forecast  the  AIDS  incidence,  the  EM  algorithm  is  used  [1],  [18]. 

Let  us  suppose  that: 

At  =  Number  of  cases  diagnosed  as  AIDS  during  the  month  ‘t.’ 

Nt  =  Number  of  individuals  infected  with  HIV  in  month  ‘t.’ 

Pk  =  probability  that  duration  of  infection  is  ‘A:’  months. 

Clearly,  ‘A:’  denotes  the  period  between  infection  and  diagnosis,  in  months. 


=Y^KP,-.,,  ■  (2-13> 

C=1 


Let, 


where,  ‘L’  denotes  the  expectation. 

The  prediction  of  AIDS  incidences  for  any  future  month,  ‘t,’  is  given  by: 


Pt  =^KPt-c^i 

c=\ 


(2.14) 
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The  parameter,  is  estimated  using  EM  teehniques.  Thus  this  helps  in  the 
foreeasts  of  the  future  AIDS  ineidenees.  The  methods  for  estimating  'A^'  are  diseussed  in 
[37]. 


2.9,6,  Census  Surveys,  Reeord  linkage,  or  eomputer  matehing,  is  a  means  of 
ereating  and  updating  information  that  may  be  used  in  surveys.  It  serves  as  a  means  of 
linking  individual  reeords  via  name  and  address  information  from  differing 
administrative  files.  Various  types  of  personal  information  sueh  as  name  and  address,  is 
linked  using  mathematieal  models  and  this  helps  in  the  proper  analysis  of  the  reeords. 

The  reeord  linkage  classifies  pairs  in  product  space  ‘AxB’  from  two  files  ‘A’  and 
‘B’  into  a  set  of  true  matches,  ‘Af  and  set  of  non-matches,  ‘U.’  In  order  to  find  if  the 
given  pair  forms  a  match  (or  a  link),  the  following  ratio  is  evaluated  [19]; 


^_p(aj  M) 
p{a\U) 


(2.15) 


where  represents  the  pair  under  consideration. 


To  find  if  forms  a  valid  link,  the  following  decision  rule  is  applied  on  ‘i?.’ 

If  i?  >  Tu,  then  designate  pair  as  a  link. 

If  T/  ^  ^  Tu,  then  designate  pair  as  a  possible  link  and  hold  for  review 

Iff?  <  Ti,  then  designate  pair  as  a  non-link. 

Here,  ‘  ’  and  ‘  7)  ’  are  the  thresholds  determined  from  prior  information.  The  ratio 


‘7?’  is  known  as  the  matching  parameter.  It  has  been  presented  in  [36]  that  these 
matching  parameters  can  be  estimated  directly  from  the  data  using  EM. 
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2.10.  CONCLUSIONS 

This  Section  discussed  the  underlying  concepts  of  the  EM  algorithm.  It  also 
showed  the  mathematical  formulation  of  the  algorithm.  The  application  of  the  EM 
algorithm  in  estimating  the  mixture  model  is  discussed.  Applications  of  the  EM  algorithm 
in  various  fields  are  also  presented. 


20 


3.  IMAGE  SEGMENTATION  USING  EM 

3.1.  INTRODUCTION 

Image  segmentation  is  the  proeess  of  division  of  an  image  into  distinet  region.  The 
level  to  whieh  this  division  is  earned  depends  on  the  type  of  details  needed.  Image 
segmentation  algorithms  are  generally  based  on  the  following  eriteria  [24]: 

(i)  Segmentation  based  on  similarity 

(ii)  Segmentation  based  on  diseontinuity 

In  segmentation  based  on  similarity,  all  regions  that  are  similar  as  per  some  eriteria 
are  elassified  as  one  region,  whereas  in  the  ease  of  segmentation  based  on  diseontinuity, 
eertain  diseontinuity  eriteria  sueh  as  edges  are  used  to  divide  images  into  regions.  This 
thesis  presents  a  teehnique  for  segmentation  based  on  similarity  using  EM  approaeh.  The 
eriterion  of  similarity  is  statistieal  in  nature,  i.e.  regions  that  are  statistieally  similar  are 
elassified  as  one  region.  Single  pixel  level  segmentation  and  segmentation  based  on 
spatial  distribution  are  performed.  In  ease  of  segmentation  based  on  spatial  distribution, 
spatial  eorrelation  of  the  data  is  eaptured. 

In  the  ease  of  a  minefield  deteetion  system,  the  sensor  is  seleeted  so  that  they  have 
good  signal  for  the  objeets  of  interest.  Thus  infrared  imaging  is  used  to  deteet  mines  with 
good  thermal  signatures.  Image  segmentation  ean  also  be  used  to  deteet  anomalies.  It  is 
beeause  the  anomalies,  due  to  their  statistieal  nature,  would  most  likely  belong  to  a  elass 
that  is  well  separated  from  the  baekground  elass.  This  would  help  in  their  deteetion  using 
the  eoneept  of  pixel  membership.  This  is  the  topie  of  diseussion  of  Seetion  4. 


3.2.  SEGMENTATION— SINGLE  PIXEL 

In  stoehastie  model-based  image  segmentation,  an  image  is  divided  into  regions. 
Eaeh  region  is  assoeiated  with  one  of  a  finite  number  of  elasses  and  eaeh  elass  eomes 
from  a  pre-defined  number  of  elasses  that  are  modeled  as  random  distributions.  Beeause 
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all  these  distributions  are  parametric  models,  a  problem  of  parameter  estimation  often 
arises.  The  EM  algorithm  is  an  excellent  technique  to  estimate  parameters  in  problems 
that  have  some  information  missing.  Once  the  parameters  are  estimated,  all  samples  can 
be  associated  with  classes.  Therefore  the  EM  algorithm  can  be  used  to  segment  the 
image. 

Eet  the  observed  data  vector  be,  y  =  \y^,y2,..;y j, . assuming  ‘n’ 

observations.  In  the  image  segmentation  model,  the  pixels  are  considered  as  samples. 
Therefore  ‘  y j  ’  denotes  the  pixel  intensity  of  the  j'^  pixel.  In  the  process  of  image 

segmentation,  as  previously  described,  each  pixel  intensity  belongs  to  a  particular  class 
that  is  unknown.  Additional  unobserved  variable  ‘  z.j  ’  is  added  to  the  unknown  variable 

list  which  is  an  indicator  variable  that  is  one  or  zero  according  to  whether  ‘  ’  (/  =  1,2... 

n)  arose  from  the  {i=  1,  2...  g)  class. 

It  is  assumed  that  each  class  has  a  Gaussian  distribution  with  the  unknown  mean 
and  variance  so  that  the  probability  density  function,  pdf,  for  the  i*’’  class  is  given  by: 

Piiy)  =  N(y;  ,  where  /  =  1,  2...g.  (3.1) 

Then  the  pdf  of  the  observed  data,  ‘  y  ,’  can  be  written  as: 

f{y  I  ^)  =  )  ■  (p  =  {n,,n„a^},  (3.2) 

(  =  1 

where  ‘  n.  ’  is  a  coefficient  that  denotes  proportion  of  the  sample  due  to  the  class  and 


TT,  =  1 . 


(=1 


(3.3) 
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Following  the  lines  of  the  prior  diseussion  in  Section  2.5,  the  log-likelihood 
function  of  the  complete  data  is  formed  as. 


g  n 

(3.4) 

i=i  j=i 


This  log-likelihood  function  can  also  be  written  as: 


ln[4(^|x)]  = 


(3.5) 


/=!  7=1 


/=1  7=1 


As  the  above  likelihood  function  is  linear  in  the  'E'  step  on  the  (k+lf' 
iteration  requires  the  calculation  of  the  expectation  of  ‘  z.j given  the  observed  data,  ‘  y 
This  expected  value  of  ‘  z^.  ’  is  obtained  as: 


(k)  _  2(k) 


f(y,  r) 


(3.6) 


where  is  the  Expectation  operator  over  the  current  parameter  ‘  ^3*^ ,’  and 


(3.7) 


gives  the  distribution  of  the  observed  data,  ‘  yj ,’  for  the  k*  iteration.  The  ‘Q’  function  is 
then  written  as: 


q((P,  (p" )  =  [ln{£,  {(p\x)]  I  . 


(3.8) 
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On  maximizing  the  ‘  Q  ’  function,  the  update  equations  are  obtained  for  the  various 
parameters.  These  equations  are  provided  in  appendix — A.  1.1. 

Therefore  with  the  iteration  of  the  expectation  and  maximization  step,  the  EM 
algorithm  not  only  gives  a  good  estimate  of  the  parameters  of  the  class  distribution  but 
also  tells  about  the  proportion  in  which  these  classes  are  mixed.  This  is  then  used  to 
model  the  data.  The  above  model  that  represents  the  observed  data  as  a  mixture  of  classes 
is  also  sometimes  referred  to  as  a  mixture  model  because  it  represents  a  mixture  of  ‘g’ 
classes,  each  with  a  pdf  of  its  own.  In  the  above  problem  all  classes  are  assumed  to  have  a 
Gaussian  distribution  with  each  class  having  a  mean  and  variance  of  its  own.  The 
distribution  can  be  any  parametric  distribution.  For  the  modeling  of  the  background, 
pixel-level  classification  has  been  done  using  Gaussian  mixture  model.  In  case  of 
modeling  of  the  RX  statistic,  the  classification  has  been  done  on  the  RX  detection 
statistic  using  Beta  and  Gamma  distributions  as  shown  in  Section  5. 


3.3.  SEGMENTATION— SPATIAL  DISTRIBUTION 

In  order  to  capture  spatial  correlation,  5 -dimensional  data  is  generated  from  the 
spatial  neighborhood  at  a  distance  of  two.  Figure  3.1  shows  a  scheme  for  sampling  the 
data  to  capture  the  local  spatial  variations  of  the  data.  Pixels  at  a  distance  of  two  are 
considered  since  4-neighborhood  pixels  are  expected  to  be  highly  correlated  and  do  not 
capture  useful  spatial  characteristics  of  the  local  neighborhood.  Taking  a  large 
neighborhood  with  more  samples  increases  the  dimensionality  of  the  model  which  is 
computationally  expensive. 

Since  in  this  case  the  segmentation  is  based  on  spatial  distribution,  segmentation 
reflects  spatial  correlation  in  the  data.  The  class  assignment  depends  not  only  on  the  gray 
value  of  the  pixel  but  also  on  the  gray  values  of  the  nearby  positions  with  respect  to  the 
center  pixel.  The  Gaussian  mixture  model  discussed  in  Section  3.2  is  still  valid  with  the 
only  difference  that  a  multivariate  normal  distribution  is  used  to  define  the  mixture 
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model.  The  update  equations  for  multivariate  Gaussian  mixture  model  are  provided  in 
appendix — A.  1.1. 


Figure  3.1.  Spatial  Distribution  of  the  Pixels. 


3.4.  PIXEL  LEVEL  CLASS  ASSIGNMENT 

Assuming  there  is  a  512x640  image,  it  has  327,680  samples,  (jj  to  7327680)-  If 

given  image  is  modeled  by  ‘g’  classes,  then  each  one  of  these  samples  can  be  assumed  to 
belong  to  one  of  the  ‘g’  classes.  Before  the  start  of  the  algorithm,  each  pixel  is  assigned  a 
class  randomly  with  equal  probability,  since  there  is  no  estimate  of  the  initial  parameters. 
It  is  to  be  noted  here  that  the  information  known  to  the  algorithm  are  the  observed  data, 
the  initial  guessed  values  and  the  number  of  classes.  After  the  algorithm  converges,  array 
of  indicator  variables,  ‘  Zy ,’  can  be  used  to  assign  classes  to  all  the  samples.  If  there  are 

‘n’  samples  and  ‘g’  classes  then  the  ‘  Zy  ’  array  has  ‘g’  rows  and  ‘n’  columns.  Consider 

the  column  (/  =  1  to  n).  The  row  (i  =  1  to  g),  of  the  column  represents  the 
probability  of  the  sample  belonging  to  the  class.  The  row  that  has  the  maximum 
probability  for  a  given  column,  j,  represents  the  class  assigned  to  the  sample.  The 
segmentation  is  achieved  by  representing  all  samples  that  belong  to  a  particular  class  as  a 
distinct  region  in  the  image. 
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3.5  RESULTS— SEGMENTATION  USING  SINGLE  PIXEL 

Figures  3. 2-3 .4  show  the  results  of  segmentation  by  two,  three  and  four  elasses.  In 
these  figures,  part  (a)  shows  the  image  that  has  been  segmented  by  the  given  number  of 
elasses,  part  (b)  shows  the  segmented  image  and  part  (e)  shows  the  probability  density 
funetion  (pdf)  of  the  eonstituent  elasses.  The  sum  of  these  pdfs  models  the  histogram  of 
the  samples.  The  eolor  on  the  segmented  image  and  the  line  eolor  of  the  pdfs  are 
matehed  for  eonvenienee  of  interpretation.  The  histogram  of  the  samples  has  been 
obtained  by  sampling  the  image  (row-wise  and  eolumn-wise)  with  a  sampling  rate  of 
five.  The  sample  pdf  has  been  plotted  using  the  red  dash-dot  line  in  part  (e)  of  the  Figures 
3.2-3.4.  A  good  mateh  between  the  sum  of  elass  pdfs  and  the  sample  pdf  shows  the 
aeeuraey  of  segmentation  and  modeling  with  the  given  number  of  elasses. 

The  legend  of  the  figure  shows  the  fraetional  membership  of  eaeh  mixture 
eomponent.  For  example,  in  Figure  3.2  (e)  the  proportion  of  Class- 1  is  0.32  whereas  the 
proportion  of  Class-2  is  0.68.  The  two  regions  ean  be  elearly  seen  in  the  image.  One  is 
the  baekground  and  the  other  is  region  formed  by  the  traek  marks  of  a  vehiele.  As  ean  be 
seen  from  the  pdfs  of  the  elasses,  the  region  formed  by  the  traeks,  forms  the  minor  elass 
that  has  the  proportion  of  0.32.  In  the  ease  of  three  elasses,  the  image  ean  be  divided  into 
three  regions.  The  first  region  eonsists  of  the  baekground,  the  seeond  eonsists  of  the 
vegetation  and  the  third  region  is  the  slightly  bright  region  surrounding  the  vegetation. 
The  pdfs  elearly  show  that  the  proportion  of  the  elasses  eonsisting  of  the  seeond  and  third 
region  is  less  than  that  of  the  major  elass  that  eonsists  of  the  baekground. 

In  the  ease  of  four  elasses,  the  baekground  is  divided  into  two  regions  based  on  the 
type  of  soil,  i.e.  dark  and  light.  There  are  some  bright  regions  in  the  image  that  most 
likely  represent  trees.  These  bright  regions  are  further  divided  into  the  other  two  elasses, 
whieh  have  smaller  memberships. 

The  results  (Figures  3. 2-3 .4)  show  that  the  EM  algorithm  was  able  to  segment  the 
image  well  for  all  the  three  eases.  It  ean  also  be  seen  from  these  figures  that  the  sum  of 
the  constituent  pdfs  very  well  modeled  the  sample  pdfs. 
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Y1G01P1LEN355:  Segmented  Image,  modeled  by2  classes 


(a)  Actual  Image 


(b)  Segmented  Image 


Y1G01P1LEN355  (Modeled  by  2  classes) 


Figure  3.2.  Example  of  Image  Segmentation  Using  Two  Classes. 
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(a)  Actual  Image 


(b)  Segmented  Image 


Y1F01P2LFN152  (Modeled  by  3  classes) 


(c)  Modeling  by  Three  Classes 

Figure  3.3.  Example  of  Image  Segmentation  Using  Three  Classes. 


PDF  of  classes  - 
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(a)  Actual  Image 


(b)  Segmented  Image 


Y1G01P1LEN395  (Modeled  by  4  classes) 


Figure  3.4.  Example  of  Image  Segmentation  Using  Four  Classes. 
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3.6.  RESULTS— SEGMENTATION  USING  SPATIAL  DISTRIBUTION 

Figure  3.5  shows  the  results  for  the  segmentation  based  on  spatial  distribution  from 
two  representative  frames  of  the  May  2003  data.  Figure  3.5  (a)  and  (e)  show  the  image 
frames  whereas  Figure  3.5  (b)  and  (d)  show  the  corresponding  segmented  images. 


Y1F01P2LFN120 


(a) 


Y1F01P2LFN120:  Segmented  Image,  modeled  by6  classes 


'.‘X 

V 

*  ■  V  .... 

4k 

r  »•  --wT'X  '  -c  ^  ^  r 

^  t/f.*  >  -  J  ..vV.-  ~v 

,.C,  rf'V  .  .  /  \  '  :?  V  V 

V 

(b) 


30 


In  this  case  the  segmentation  eaptures  both  gray  value  features  (like  vegetation  soil 
ete)  as  well  as  the  transition  from  one  type  of  region  to  another.  Beeause  of  this  reason 
the  edges  formed  due  to  the  transitions  in  the  baekground  areas  (for  example  due  to  the 
eolor  of  the  soil)  are  eaptured  and  shown  as  small  patehes  in  the  segmented  image. 

3.7.  CONCLUSIONS 

In  this  seetion,  single  pixel  level  segmentation  of  the  MWIR  images  is  diseussed. 
Multi  (five)  band  data  is  generated  from  spatial  neighborhood  and  segmentation  based  on 
spatial  distribution  is  performed.  The  results  of  these  two  types  of  segmentation  are 
shown  and  diseussed. 
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4.  ANOMALY  DETECTION  USING  EM 

4.1.  INTRODUCTION 

In  this  section,  various  methods  of  anomaly  detection  are  presented.  First  of  all,  a 
brief  literature  of  anomalies  or  outliers  (as  they  are  called)  is  presented.  This  is  followed 
by  the  study  of  some  of  the  recent  EM-based  techniques  that  have  evolved  to  detect 
anomalies.  All  of  these  require  a  mixture  model.  This  section  uses  the  results  of  EM- 
based  segmentation  that  have  been  obtained  in  the  previous  section.  In  other  words,  the 
anomaly  detection  stage  comes  after  the  pixel-level  class  assignments  have  been  made. 
Therefore  once  the  background  data  has  been  modeled  with  a  Gaussian  or  some  other 
mixture  model,  the  concept  of  pixel  membership  is  employed  to  define  different  EM- 
based  detection  statistics. 

4.2,  ANOMALIES  AND  OUTLIERS 

The  values  that  are  away  from  the  mainstream  data  are  called  as  anomalies.  Thus 
anomalies  or  outliers  can  be  broadly  described  as  observations  (or  subset  of  observation) 
that  “appear  to  be  inconsistent”  with  the  remainder  of  that  set  of  data  [12].  The  crucial 
point  is  that  the  phrase  “appear  to  be  inconsistent”  is  subjective  because  it  is  a  matter  of 
subjective  judgment  on  the  part  of  the  observer  to  declare  a  particular  observation  as 
inconsistent.  In  the  case  of  anomaly  detection  using  EM-based  algorithms,  the  outliers  are 
said  to  be  inconsistent  because  they  are  different  from  the  background  in  a  statistical 
sense. 


The  outlier  problem  is  two-fold.  Eirst,  it  is  necessary  to  determine  whether  the 
observation  is  really  an  outlier.  Once  this  is  decided,  the  next  step  is  to  find  an  efficient 
way  to  deal  with  these  outliers.  Rejection  of  outliers  may  not  be  an  efficient  way  in 
certain  situations.  Thus  the  methods  for  processing  the  outliers  take  an  entirely  relative 
form  and  differ  from  situation  to  situation.  Eor  the  current  discussion,  these  outliers  are  of 
interest  as  possible  indications  of  presence  of  mine-like  targets. 
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Anomaly  detection  is  extensively  used  within  the  field  of  target  detection  and 
computer  security  [13].  In  the  case  of  target  detection,  the  process  of  anomaly  detection  is 
comprised  of  finding  the  data  samples  that  are  different  from  the  other  samples  in  its 
neighborhood  in  some  way.  In  the  case  of  intrusion  detection  (in  computer  security),  the 
call  traces  are  observed  for  sequences  that  are  different  from  the  others,  and  these  are 
then  categorized  as  anomalous  or  intrusions  [13].  Anomaly  detection  for  hyper  spectral 
imagery  (HSI)  is  a  useful  technique  for  detecting  objects  of  military  interest  [14].  Once 
the  anomalies  are  detected,  they  can  be  passed  to  a  classifier  in  order  to  determine  the 
exact  nature  of  the  unusual  object. 

The  remaining  part  of  this  section  presents  different  EM-based  anomaly  detectors 
that  use  the  concept  of  pixel  membership.  In  all  these  detectors,  it  is  assumed  that  the 
anomalies,  being  different  from  the  background,  would  form  a  class  that  is  well  separated 
from  the  background.  All  these  EM-based  detectors  are  compared  with  the  RX  anomaly 
detector.  Therefore  the  next  section  gives  a  brief  review  of  the  RX  anomaly  detector. 


4,3.  THE  RX  ANOMALY  DETECTOR 

The  RX  algorithm  as  suggested  by  Reed  and  Yu  [25]  can  be  used  for  multi-band 
images  with  a  zero  mean,  uncorrelated  Gaussian  background.  Eor  a  ‘J’  band  image  ‘7,’ 
the  RX  statistics,  rj^ ,  at  any  location  is  given  as: 


Tr  Aj,  M  fj.g  ]  , 


(4.1) 


where  ‘  M  ’  is  the  locally  estimated  covariance  matrix  of  dimension  "Jxf  given  by: 


M 


N 

c 


(4.2) 
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‘  ’  is  the  mean  target  signature  and  is  given  by: 


Ms  = 


(4.3) 


T  l^Wr 


Here,  ‘  number  of  clutter  pixels  and  '  Nj.'  is  the  number  of  target  pixels 

which  are  given  as: 


Nc=Ar^-r^), 


(4.4) 


Nj  =  Tir^ . 


(4.5) 


This  thesis  considers  only  single  band  data,  and  therefore  J  =  \  .  The  RX  statistic  in 
this  case  is  given  as: 


Nj 


—  Za’ 

A'c 


,2  A 


=  N, 


v^y 


(4.6) 


4,4,  SEM-BASED  ANOMALY  DETECTORS  IMPLEMENTED 

This  section  will  discuss  three  SEM-based  anomaly  detectors.  All  these  detectors 
are  based  on  the  Gaussian  mixture  model  of  the  image.  The  mixture  model  is  estimated 
using  the  SEM  algorithm.  Once  the  SEM  divides  all  the  image  pixels  into  classes  and 
estimates  the  class  parameters,  one  is  ready  to  enter  the  SEM-based  anomaly  detection 
stage.  The  detectors  that  have  been  implemented  are  discussed  in  the  following  sub¬ 
sections. 
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4,4.1.  Detector  Using  Pixel  Membership,  Sometimes  the  target  pixels  are  not 
elearly  resolved  from  the  background,  but  they  do  appear  somewhat  atypical  relative  to 
the  densities  that  define  the  mixture  model.  This  fact  is  exploited  by  the  SEM  detector  for 
detecting  poorly  resolved  targets. 

In  this  method,  the  likelihood  of  pixels  is  evaluated  for  the  presence  of  anomalies. 
If  the  target  pixels  are  separated  out  from  the  background,  then  their  membership  with 
respect  to  the  background  is  low.  Thus  the  pixels  for  which  the  likelihood  is  sufficiently 
low  can  be  declared  as  targets.  To  test  for  the  presence  of  anomalies,  the  metric  used  over 
a  target  window,  W  is  given  as  [17]; 


r  =  -y  In 


^In  J^7r,p,(yJ 


i=l 


(4.7) 


where,  pj,  is  the  pixel  value  of  the sample,  "In’’  is  the  natural  logarithm,  ' g'  is  the 
number  of  classes,  ‘  ’  is  the  probability  of  f'  class  and  {y j )  is  the  class  membership 

for  pixel  ‘jy .’  The  size  of  the  window  'W'  should  be  matched  to  the  typical  size  of  the 
expected  anomaly. 

The  test  statistic  ‘T,  ’  gives  a  high  value  at  the  locations  of  the  anomalous  pixels. 

4,4,2,  Detector  Using  Locally  Optimum  Bayes  Detection  (LOBD),  In  this 
detector,  a  locally  optimum  detection  algorithm  is  applied  for  detection  of  random  signals 
that  occur  in  the  mixture  noise.  The  LOBD  statistic  for  a  discrete  zero-mean  signal  in 
mixture  noise  is  derived  under  the  assumption  that  the  signal  and  noise  are  independent. 
The  detection  statistic  is  used  to  distinguish  between  the  following  null  and  non-null 
hypotheses: 


H,  : Ty  e  Nj 
H,  :y.  e^f^  +  Nj, 


(4.8) 
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where  j  are  the  pixels  in  some  neighborhood  window  W,  ‘s'  is  the  unit  varianee 
signal  template,  ‘a’  is  the  unknown  varianee  of  the  reeeived  signal  and  ‘  N ^  ’  is  the  7* 
noise  sample. 

If  the  mixture  noise  is  Gaussian,  then  a  loeally  optimum  test  statistie  has  been 
derived  in  [16].  This  is  given  as: 


J^fV  i=l 


-2  \\yj 
-^  +  — - 


|2  7 


cr. 


cr. 


(4.9) 


where. 


p[k\yj,HQ)= 
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Here,  p{k\y j,Hq)  is  the  probability  of  elass  ‘k'  for  null  hypothesis,  given  the 
input  samples.  The  deteetion  statistie,  T{yj),  gives  a  high  value  for  the  anomalies. 


It  ean  be  noted  that  the  statisties  diseussed  in  seetion  4.4.1  and  4.4.2  are  based  on 
the  segmentation  of  the  data  representing  the  whole  image.  As  a  result  they  are  not 
loeally  adaptive.  In  the  next  seetion,  a  loeally  adaptive  anomaly  deteetor,  based  on  multi- 
elass  RX  is  diseussed.  This  algorithm  has  been  adapted  from  the  one  proposed  by  Dr. 
Fries  for  the  airborne  program  [15]. 

4,4,3.  Detector  Using  Multi-Class  RX,  It  is  known  that  SEM  segments  an  image 
into  homogeneous  regions  based  on  the  intensity  at  eaeh  pixel.  The  intensity  at  eaeh  pixel 
is  then  tested  against  the  regions  that  surround  it  to  measure  the  degree  to  whieh  it  stands 
out.  This  can  be  operated  on  a  window  that  moves  down  through  the  image.  This  can  be 
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done  by  means  of  a  likelihood  ratio  test  [15]  where  the  numerator  is  likelihood  of  the 
observed  speetrum  assuming  the  target  is  present  and  denominator  is  likelihood  of  the 
observed  speetrum  assuming  the  target  is  not  present. 


Let  the  probability  of  the  pixel  belonging  to  the  elass  be  given  by: 


pAh,^)= 


p[y  A 

ZP[yj\(^Bm^/^Bm) 

m 


(4.11) 


Likelihood  of  the  elass  in  a  neighborhood  under  the  window,  ‘  WJ  is  given  by: 


Nj 


(4.12) 


Then,  the  likelihood  of  ‘  y  ’  under  null  and  non-null  hypothesis  is  given  by: 


p^,  I ".) = Z,  )  ■ 


(4.13) 

(4.14) 


The  likelihood  ratio,  ‘  A  ’  is  given  as: 


.p{y,\Hj 

Piyjlffy 


(4.15) 


The  deteetion  statistie,  ‘A,’  gives  a  high  value  for  the  anomalous  pixels.  If  the 
anomaly  target  region  is  known  to  be  of  some  size  ‘  W  then  the  test  statistie  for  the  log- 
likelihood  can  be  written  as: 

A  =  log(A)  =  [logMTyl^ .4  )}-  log{^(Tyl^ 0  )}]  • 

jGiV  JgW 


(4.16) 
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4,5.  THE  AIRBORNE  IMAGERY  AND  THE  MINE  DISTRIBUTION 

This  section  presents  an  overview  of  the  data  processed.  The  image  data  in  airborne 
minefield  detection  is  collected  by  a  low  flying  aircraft  over  different  terrain  during  four 
different  times  of  the  day.  Each  flight  of  data  collection  is  called  a  run  which  consists  of  a 
certain  number  of  images  (called  frames)  captured  at  approximately  8  frames  per  second. 
Each  image  is  of  512  x  640  pixels  in  dimension.  The  data  is  collected  for  different  mid 
wave  infra  red  (MWIR)  bands  and  over  four  different  times  of  the  day,  typically — early 
morning,  afternoon,  evening  and  night. 

The  October  2002  data  consists  of  15  runs.  The  15  runs  contain  data  from  three 
missions  over  different  times  of  the  day  and  bands.  There  are  a  total  of  2143  frames 
(143x14  +  141x1).  This  particular  collection  of  data  consists  of  daytime  data  only.  The 
primary  terrain  for  the  data  collection  can  be  classified  as  rocky  arid  areas  with  some 
vegetation.  Also,  the  data  present  contains  only  surface  mines  in  it.  The  ground  truth  is 
available  for  371  mine  targets,  consisting  of  190  large  surface  mine  targets  and  181  small 
surface  mine  targets.  Ground  Truth  is  also  available  for  430  fiducials.  These  are  rejected 
from  the  false  alarms  at  the  time  of  analysis.  Only  daytime  data  was  available  under  this 
data  set. 

The  dataset  for  May  2003  is  the  biggest  collection  of  data  available  for  analysis. 
The  primary  terrain  with  mines  can  be  classified  as  arid  with  little  or  no  vegetation.  In 
this  data  collection,  data  showing  background  alone  has  also  been  collected  in  addition  to 
data  showing  both  background  and  minefields.  The  collection  consists  of  37765  image 
frames  from  83  runs  over  mine  areas  from  ten  missions  covering  different  times  of  the 
day  and  bands.  It  also  contains  55  runs  over  background  only  areas  from  nine  missions 
covering  different  times  of  day  for  full  band  (Band  0  and  Band  1).  The  ground  truth  is 
available  for  12391  mine  targets  of  which  there  is  1748  large  surface  mine  targets,  844 
small  surface  mine  targets,  3946  medium  surface  mine  targets  and  5853  buried  mine 
targets  [23],  [35]. 
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These  two  data  collections  together  encompass  a  number  of  mine  and  minefield 
modalities  such  as  terrain  type,  band,  mine  signatures  and  minefield  performances.  Some 
of  the  characteristics  of  the  airborne  imagery  data  [23]  that  spans  across  all  the  data 
collections  are  listed  as  follows: 

•  Mine  Types:  Different  types  of  mines  can  be  classified  based  on  size  (large, 
medium,  small),  composition  (metal,  plastic)  or  method  of  placement  (surface, 
buried): 

o  Large  surface  mines  :  LP_B  (plastic)  and  LM_A  (metal) 
o  Medium  surface  mines  :  MP  A  (plastic) 
o  Small  surface  mines  :  SM_A  (metal) 

o  Large  buried  mines  :  LM_A_B  (large  metal),  LM_C_B  (large  metal)  and 
LP_D_B  (large  plastic). 

o  Medium  buried  mines  :  MP_A_B  (medium  plastic) 

•  Spectral  Bands:  The  four  spectral  bands  in  the  MWIR  spectrum  range  for  which 
the  data  is  collected  are  identified  as: 

o  Full  band  (3-5  pm) — Band  0,  Band  1 
o  Solar  band  (3-4  pm)  — Band  2 
o  Thermal  band  1  (4-4.5  pm)  — Band  3 
o  Thermal  band  2  (4.5-5  pm)  — Band  4 

The  various  detection  statistics  discussed  in  section  4.4  were  performed  on  the  two 
daytime  datasets  from  May  2003  data.  The  mine  distribution  for  surface  and  buried  mines 
for  these  datasets  is  shown  in  Table  4.1. 
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Table  4.1.  Mine  Distribution  for  Surfaee  and  Buried  Mines. 


Mine  Name 

Y1F01P2LFN 

Y1F01P2LDN 

Total 

LM_A 

12 

12 

24 

MPA 

47 

50 

97 

LP_B 

12 

9 

21 

SM_A 

12 

12 

24 

(a)  Surfaee  Mines 


Mine  Name 

Y1F01P2LFN 

Y1F01P2LDN 

Total 

LM_A_B 

12 

12 

24 

MP_A_B 

12 

13 

25 

LM_C_B 

40 

41 

81 

LP_D_B 

11 

8 

19 

(b)  Buried  Mines 


Figure  4.1  shows  the  mine  frames  from  this  data  (daytime).  The  various  mines  and 
elutter  are  marked  using  ‘  x  ’  marks.  The  red  eolor  is  used  to  represent  the  mines  whereas 
the  eyan  eolor  is  used  to  represent  the  elutter  or  fidueial  markers.  Figure  4.1  (a)  shows  the 
frame  with  only  surfaee  mine  signatures.  Figure  4.1  (b)  shows  the  frame  with  only  buried 
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mine  signatures.  Figures  4.1  (e)  and  (d)  show  the  frames  with  both  surfaee  and  buried 
mine  signatures. 


Y1F01P2LDN124 


Y1FD1P2LDN118 


(a) 


(b) 


Y1F01P2LFN138  Y1 F01 P2LFN136 


(c) 


(d) 


Figure  4.1.  Mine  Frames  Showing  Surface  and  Buried  Mines  from  the  May  2003  Data. 


Figure  4.2  shows  some  of  the  typical  mine  signatures  of  various  surface  mines. 
Typical  surface  mine  signature  shows  a  bright  reflection  and  a  prominent  shadow  feature. 
In  case  of  MP  A  and  SM_A  mines  the  reflection  is  not  prominent.  It  is  the  signature  due 
to  shadow  that  is  captured  by  different  anomaly  detectors. 
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Mine  Name:  LM_A  Mine  Name:  MP_A 


(a)  LP_B  (b)  SM_A 

Figure  4.2.  Individual  Mine  Signatures  of  Surface  Mines. 


Figure  4.3  shows  some  of  the  typical  mine  signatures  of  different  buried  mines.  It 
can  be  seen  that  the  buried  mines  have  a  weak  signature.  The  mine  signature  of  the  buried 
mines  is  bigger  than  the  physical  size  of  the  mine.  This  is  because  the  signature  mainly 
represents  the  disturbed  soil.  Therefore  for  the  detection  of  these  mines  a  larger  target 
window  is  required  for  better  detection  performance. 
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Mine  Name:  LM_A_B 


_ ■  ■  _ I _ »  **- 

(a)  LM_A_B 


Mine  Name:  MP_A_B 


(b)  MP_A_B 


Mine  Name:  LM_C_B 


Mine  Name:  LP_D_B 


(c)  LM_C_B  (d)  LP_D_B 

Figure  4.3.  Individual  Mine  Signatures  of  Buried  Mines. 


4.6.  RESULTS 

Figure  4.4  shows  the  MWIR  image  from  one  of  the  datasets  of  May  2003  data.  It 
also  shows  the  segmented  image  and  the  output  of  the  various  deteetors  diseussed  in 
Seetion  4.3  and  4.4. 
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Y1F01P2LDN135:  Segmented  Image 


(a)  Raw  Image  Data 


(b)  Segmented  Image 


SEM  -  Pixel  Membership 


SEM  -  Locally  Optimum  Detection 


(c)  Detector  Output  for  Pixel  Membership 


(d)  Detector  Output  for  LOBD 


SEM  ■  Multi  Class  RX 


(e)  Detector  Output  for  Multi-Class  RX  (f)  Detector  Output  for  RX  Detector 
Figure  4.4.  Raw  Image  Data,  Segmented  Image  and  Outputs  of  Various  Detectors. 
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Figure  4.4  (a)  shows  one  of  the  mine  frames  from  the  May  2003  data.  The  frame 
shows  three  surfaee  mine  types  (LM_A,  LP_B  and  SM_A),  and  one  buried  mine  type, 
LM_A_B.  It  ean  be  seen  that  the  surfaee  mine  signatures  eonsists  mainly  of  the  shadows. 
It  ean  also  be  seen  that  in  ease  of  LM_A  and  LP_B  the  signatures  partly  eonsist  of 
refleetions  as  well.  On  the  whole  the  signatures  of  the  surfaee  mines  have  a  high  eontrast. 
On  the  other  hand,  the  signatures  of  the  buried  mines  are  weak. 

Figure  4.4  (b)  shows  the  segmented  image  using  four  elasses.  The  elass  eontaining 
surfaee  mines  is  basieally  of  shadow  and  is  a  minority  elass.  Also  this  elass  is  reasonably 
separated  from  the  baekground  elass.  The  signatures  of  the  buried  mines  are  weak.  These 
signatures  basieally  eonsist  of  the  disturbed  soil  and  therefore  they  belong  to  a  elass  that 
is  not  very  well  separated  from  the  baekground  elasses. 

Figure  4.4  (e)-(e)  shows  the  outputs  of  different  SEM-based  deteetors,  whereas 
Figure  4.4  (f)  shows  the  output  of  the  RX  anomaly  deteetor.  The  outputs  of  the  EM 
deteetors  using  pixel  membership  and  multi-elass  RX  (parts  (e)  and  (e)  respeetively)  are 
on  a  log  seale  and  therefore  show  a  large  variation  in  intensity  of  different  regions.  It  ean 
be  noted  that  the  statisties  based  on  pixel  membership  and  Bayes  deteetion  (parts  (e)  and 
(d)  respeetively)  are  based  on  the  segmentation  of  the  data  representing  the  whole  image, 
whereas  the  statistie  based  on  multi-elass  RX  (part  (e))  is  operated  over  a  window.  Henee 
the  deteetor  based  on  multi-elass  RX  tends  to  deteet  loeal  anomalies.  This  ean  be  seen  by 
observing  the  output  of  this  deteetor  (part  (e))  of  Eigure  4.4. 

Eigure  4.4  (f)  shows  the  output  of  the  RX  anomaly  deteetor.  The  RX  deteetion 
statistie  is  basieally  the  signal-to-elutter  ratio  as  diseussed  in  seetion  4.3.  This  statistie  has 
a  high  value  if  the  anomalies  have  a  high  eontrast  signature  (high  signal  strength)  and  are 
surrounded  by  a  homogeneous  region  (low  elutter  varianee  and  low  elutter  strength)  in  its 
neighborhood.  As  ean  be  seen  from  part  (a)  of  the  figure  that  the  surfaee  mines  have  a 
high  eontrast  signature  and  they  are  mostly  surrounded  by  a  relatively  homogeneous 
neighborhood.  As  a  result,  surfaee  mines  are  well  deteeted  by  the  RX  deteetor. 
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In  order  to  compare  the  performance  of  the  detectors,  ROCs  have  been  generated 
using  the  different  SEM-based  detectors  discussed  in  Section  4.4.  These  detectors  are 
compared  with  the  RX  anomaly  detector.  All  the  ROCs  show  the  probability  of  detection, 
Pd,  achieved  until  the  False  Alarm  Rate  (FAR)  of  0.1/m  .  There  are  eight  sub-plots,  each 
corresponding  to  one  of  the  eight  types  of  mines  listed  in  Section  4.5.  The  title  of  these 
figures  shows  the  mine  type  and  the  number  of  such  mines  in  the  data.  For  example, 
FM_A  (24)  in  Figure  4.5  (a)  tells  that  there  are  24  FM_A  mines  in  the  data.  The  ROCs 
are  calculated  over  the  frames  with  mines  only.  Figures  4.5  shows  ROCs  calculated  for 
surface  mines.  Figure  4.6  shows  the  ROCs  calculated  for  the  buried  mines. 
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Figure  4.5.  ROC  Curves  for  Different  Surface  Mines. 
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SEM  Vs  RX:  LM_A_B  (24) 


SEM  Vs  RX:  MP_A_B  (25) 
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Figure  4.6.  ROC  Curves  for  Different  Buried  Mines. 


The  observations  of  this  seetion  follow  from  the  discussion  in  section  4.6.  As  can 
be  seen  from  the  mine  frames,  the  surface  mines  produce  a  high  contrast  signature.  The 
target  mask  used  in  RX  [23],  [35]  covers  the  signature  very  well  and  produces  a  high 
value  of  the  signal  strength.  The  background  is  generally  homogeneous  and  therefore  the 
clutter  variance  is  low.  Thus,  the  RX  anomaly  detector  performs  very  well  for  the  surface 
mines.  Because  of  the  high  contrast  signature,  the  surface  mine  pixels  are  well  separated 
from  the  background  classes.  Therefore  SEM-based  detectors  perform  well  for  these 
mines  as  well.  On  the  other  hand,  the  detector  based  on  multi-class  RX  detects  lots  of 
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false  alarms  because  the  detector  operates  over  a  window  and  therefore  detects  local 
anomalies.  Overall  performance  of  the  RX  detector  for  surface  mines  is  better  than  (or 
comparable  to)  that  of  the  EM-based  detectors  because  of  a  relatively  homogeneous 
background. 

The  buried  mines  have  a  very  weak  signature  that  is  not  well  separated  from  the 
background.  Due  to  this  weak  signature,  the  signal  strength  is  very  low  and  for  this 
reason  the  performance  of  the  detectors  is  poor  for  the  buried  mines.  In  case  of  the  SEM- 
based  detectors,  the  class  containing  buried  mines  is  not  well  separated  from  the 
background  class  and  there  are  many  regions  in  the  background  that  are  in  the  same  class 
as  the  mine  area.  Eor  example,  on  observing  parts  (a)  and  (b)  of  Eigure  4.4  it  can  be  seen 
that  the  regions  of  dark  soil  belong  to  the  same  class  that  contains  buried  mines.  Thus,  the 
performance  of  the  SEM-based  detectors  also  goes  down. 

It  is  clear  from  the  mine  signatures  of  the  buried  mines  that  a  bigger  target  radius  is 
required  to  capture  the  mine  signatures  of  these  mines.  In  case  of  RX,  a  large  clutter 
radius  introduces  non-homogeneities  of  the  background  which  increases  the  clutter 
variance  and  thus  reduces  the  detection  statistics.  However,  the  SEM-based  detectors  do 
not  suffer  from  the  disadvantages  of  the  higher  mask  size  because  target  pixels  are 
evaluated  against  a  set  of  background  class.  The  mask  size  can  be  increased  to 
accommodate  the  complete  mine  signature  and  correspondingly  larger  background  area 
can  be  used  to  evaluate  the  detection  statistics. 

Thus,  in  the  case  of  buried  mines,  the  performance  of  the  SEM  detectors  is  likely  to 
be  better  than  the  RX  anomaly  detector  in  couple  of  cases.  In  the  above  results  for  the 
case  of  LM_A_B  mines  (Eigure  4.6  (a))  SEM  MCRX  is  better  performing  than  RX.  In 
case  of  MP_A_B  mines  (Eigure  4.6  (b))  SEM_PM  performs  better  than  RX.  Overall,  the 
results  are  comparable. 
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4,7.  CONCLUSIONS 

In  this  section,  the  problem  of  anomaly  detection  using  image  segmentation  has 
been  presented.  The  various  SEM-based  anomaly  detectors  that  use  the  concept  of  the 
anomaly  class  to  separate  the  anomalies  from  the  background  have  also  been  presented. 
The  SEM-based  anomaly  detectors  are  compared  with  the  RX  anomaly  detector.  The 
performance  of  the  surface  and  buried  mines  has  also  been  analyzed  for  these  detectors. 
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5.  MODELING  OF  DETECTION  STATISTICS  USING  EM 

5.1.  OVERVIEW 

In  the  previous  seetion,  segmentation  of  the  baekground  data  into  distinct  classes 
was  discussed.  This  section  shows  that  the  EM  algorithm  can  also  be  used  to  model  the 
output  of  detection  statistics  such  as  RX.  The  modeling  of  the  detection  statistic  is  very 
important  since  it  represents  the  spatial  correlations  and  the  local  non-homogeneities  in 
the  data.  Also  in  order  to  compare  and  quantify  the  performance  of  an  anomaly  detector 
such  as  RX  over  different  terrain,  it  is  necessary  to  model  the  detection  statistic  into 
probabilistic  models.  This  modeling  also  helps  in  the  adaptive  Constant  False  Alarm  Rate 
(GEAR)  threshold  selection  to  locate  potential  targets  as  discussed  in  Section  6. 

Various  parametric  distributions  such  as  Beta  and  Gamma  distributions  are  used 
for  modeling  the  detection  statistic.  These  distributions  have  been  used  in  modeling  due 
to  their  modeling  flexibility.  Once  the  background  data  has  been  modeled  with  various 
mixture  models,  the  model  that  fits  the  data  most  accurately  can  be  determined.  This  is 
done  by  using  various  test  statistics.  In  the  end,  the  performance  of  various  mixture 
models  is  compared  based  on  the  test  statistics. 


5.2.  THE  RX  STATISTIC 

The  RX  test  statistic  for  a  single-band  image  data  has  been  discussed  in  Section 
4.4.  The  RX  statistic  follows  an  F-distribution  for  clutter  and  is  given  by: 
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The  above  equation  is  of  the  form, 


v^iy 


v^2y 


,  where  and  Xj  are  the  Chi- 


Square  variables  with  degrees  of  freedom  Vj  =  1  and  Vj  =  .  Let,  r  =  — ^  = 


It  has  been  shown  in  [46]  and  [47]  that  the  output  of  the  RX  anomaly  deteetor  ean 
be  modeled  by  a  Gamma  mixture  model.  This  thesis  uses  the  two  parameter  Gamma 
mixture  model  to  model  the  detection  statistic,  V .  The  Gamma  model  used  is  given  by: 

G{r:X,k)  = -  ,  0<r<oo;U>0,  (5.2) 

r(yi:) 

where,  r(A:)  is  the  complete  Gamma  function. 

The  Gamma  model  has  received  widespread  importance  in  the  modeling  because  it 
represents  the  distribution  of  sum  of  squares  of  independent  normal  variables.  It  is  used 
extensively  in  the  modeling  of  the  SAR  data  [27].  One  of  the  most  important  models  of 
the  SAR  data  is  the  multiplicative  model.  Here  the  SAR  data,  ‘Z,’  is  modeled  as  the 
product  of  two  independent  random  variables,  ‘X  and  ‘T.’  Therefore: 


Z  =  XT  (5.3) 

The  random  field  ‘X  models  the  backscatter  that  depends  on  the  area  each  pixel 
belongs  to,  while  ‘F  takes  into  account  the  fact  that  the  SAR  images  are  the  result  of  a 
coherent  imaging  system  [26].  The  Gamma  distributions  have  been  proposed  for  ‘X  and 
‘F  because  of  their  excellent  modeling  capability  of  heterogeneous  areas  such  as  forests. 
The  modeling  of  SAR  images  using  the  multiplicative  model  of  Gamma  distributions  is 
also  done  in  [28].  Gamma  mixture  models  have  been  used  for  target  recognition  by  Webb 
[32],  who  uses  the  EM  algorithm  to  estimate  the  parameters  of  a  Gamma  mixture  model. 
The  Gamma  mixture  model  approach  has  also  been  used  for  target  recognition  in  [33]. 
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Now  consider  the  following  transformation  on  V’: 


x  =  — re[0,oo),  xe[0,l]. 

r +  1 


(5.4) 


With  this  transformation,  the  probability  density  funetion  for  the  test  funetion  value 
(‘  X  ’)  under  null  hypothesis  is  a  Beta  density  funetion  given  by  [45]; 
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Here  W  denotes  the  number  of  samples  and  '  J'  denotes  the  dimension  of  the 
image,  whieh  is  one  for  a  single -band  ease.  In  standard  form  the  expression  in  Equation 
(5.3)  ean  be  written  as: 


B{x:Y,rj)  = 


r(r  +  ?7) 
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y  >  0,  rj  >0  and  0  <  x  <  1 , 


(5.6) 


where  /  =  ~  rj  =  — - — .  (5.7) 

Another  widely  used  Beta  model  in  literature,  [11],  is  the  so  ealled  three  parameter 
Beta  given  by: 


B{x:r,ri,X) 


0  <  X  <  \',y,ri,A,  >  0, 


(5.8) 


where,  B{y,  rj)  is  the  eomplete  Beta  funetion. 
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Equation  (5.6)  reduces  to  the  two  parameter  Beta  distribution  for  A  =1.  The 
parameter,  ‘/I,’  allows  the  three  parameter  Beta  distribution  to  have  a  wider  variety  of 
shapes  than  the  standard  two  parameter  Beta.  This  is  an  important  property  in  modeling 
because  the  three  parameter  Beta  has  an  additional  flexibility.  For  example,  if  in  the  two 
parameter  Beta,  y^ij,  then  the  distribution  is  symmetric  with  a  mean  value  at  0.5. 
However,  the  three  parameter  Beta  can  be  skewed  depending  on  ‘  /I  ’  because  the 
kurtosis,  mode  and  skewness  depend  on  ‘  /I .’ 


5.3.  THE  NON-MAX  SUPPRESSION 

The  output  of  the  RX  algorithm  is  basically  the  “signal-to-clutter”  image.  This  list 
of  detections  consists  of  the  coordinates  of  the  potential  targets  along  with  the  RX  test 
statistic  value.  This  is  then  subjected  to  the  non-max  suppression  in  order  to  get  a  list  of 
targets  that  have  been  highlighted  by  the  anomaly  detector.  Non-maximal  suppression  is 
a  processing  algorithm  that  suppresses  (makes  zero)  all  the  targets  in  a  pre-defined  R- 
pixel  radius  neighborhood  except  the  local  maximum. 

Let /  (x\Ho)  be  the  probability  density  function  for  the  RX  clutter  statistic  x  under 
null  hypothesis  after  non-max  suppression.  Let  the  function  g(l)  represent  the  mapping 
function  performed  by  the  non-max  operation  and  'Wj^'  be  the  i?-pixel  neighborhood. 
Thus, 


g(/)  = 


if  Xj  =  Xi  =  local  maximum  in 
elsewhere 


(5.9) 


The  probability  density  function  used  to  model  the  background  clutter  statistics 
after  non-max  suppression  is  given  as: 


/(•■'  I  gV)  =  1)  = 


00 


(5.10) 
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where  / (x  \  Ho)  is  the  probability  due  to  RX  statisties  under  null  hypothesis,  N  =  0A  is 
the  expeeted  number  of  independent  targets  present  in  the  neighborhood  'A'  is  the 
area  of  the  neighborhood  and  '6'  is  the  density  of  the  potential  targets.  Also, 


F{x)  =  CDF  of  x  =  /(x  <  X  I  //() )  =  I  /(x  I  //q  )dx  , 


(5.11) 


where  CDF  is  the  Cumulative  Distribution  Funetion. 


Thus  for  example  in  the  ease  of  three  parameter  Beta  distribution. 


f(x\H^)=B{x\y,ri,X)  = 


;fx^^'(i-x) 


7+1 


0  <  X  <  1;  y,;;,/!  >  0 .  (5.12) 


Then,  the  modified  three  parameter  Beta,  after  the  non-max  suppression  has  the 
distribution  given  by: 


B{x:r,ri,Z,N)  = 


00 


(5.13) 


where,  F3(x)  is  the  CDF  of  f{x\HQ)  . 


In  this  thesis,  5(x  :  Y,ri)  ,B{x  :  y,;7,/l)and5(x  :  y,ri,Ji,N)  are  the  Beta  distributions 
used  to  model  the  deteetion  statistie.  The  Gamma  model  used  for  modeling  is  G{r 
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5.4,  TEST  STATISTICS  TO  MEASURE  GOODNESS  OF  FIT 

5,4,1  Various  Test  Statistics.  A  list  of  various  test  statistics  that  measure  goodness 
of  fit  is  presented  along  with  their  description  in  Appendix — B.  These  tests  include  the 
following: 


(i)  Chi-Square  Test 

(ii)  Cramer  Von-Mises  (CVM)  Test 

(iii)  Kolmogorov-Smirnov  (KS)  or  Kuiper  Test 

The  main  limitation  of  the  CVM,  KS  or  Kuiper  tests  is  that  in  literature  the  critical 
values  for  the  test  statistics  are  given  only  for  some  given  distributions,  such  as 
Exponential,  Weibull  and  Normal.  Thus  these  test  statistics  cannot  be  used  for 
applications  where  different  distributions  are  used.  If  the  distribution  is  completely 
specified,  then  these  test  perform  well  and  have  a  high  power.  But  if  the  parameters  are 
estimated  from  the  data,  then  the  power  of  these  tests  reduces.  This  is  because  new 
critical  values  need  to  be  formulated  and  these  must  be  obtained  for  the  specific 
parametric  form  that  is  to  be  tested. 


On  the  other  hand,  this  does  not  pose  a  problem  for  the  Chi-Square  test  because  the 
number  of  degrees  of  freedom  is  adjusted  in  accordance  with  the  parameters  that  are 
estimated  from  the  data.  It  is  because  of  the  flexibility  of  the  Chi-Square  test  that  it  was 
used  for  testing  the  modeling  results.  The  specifics  of  this  test  are  discussed  in  the 
following  section. 

5,4,2.  The  Chi-Square  Test,  The  Chi-Square  test  is  described  in  detail  in 
Appendix — B.  This  section  briefly  presents  the  test  and  the  way  it  is  applied  in  this  study. 
For  the  Chi-Square  test,  the  samples  are  grouped  in  bins  in  such  a  way  that  there  are  at 
least  a  certain  fixed  number  of  samples  per  bin.  Also,  when  the  test  statistic  is  evaluated 
for  the  frame  'k\  the  samples  (RX  values)  from  the  three  frames,  ‘A:’  and  "k+P  are 
considered. 
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To  evaluate  the  modeling  performanee  of  the  baekground,  it  is  necessary  to  make 
sure  that  the  samples  are  not  anomalies.  There  are  about  400  samples  per  frame,  and 
about  10  samples  per  frame  are  considered  anomalous  (i.e.  these  don’t  represent  the 
background).  Therefore,  a  threshold  is  set  so  as  to  remove  the  top  30  samples  (assuming 
that  they  are  anomalous),  and  so  the  top  30  samples  are  not  considered  while  evaluating 
the  test  performance.  Depending  on  the  number  of  parameters  that  are  to  be  estimated, 
the  degrees  of  freedom  is  calculated.  The  test  statistic  is  then  calculated  for  a  particular 
frame.  After  calculating  the  test  statistic  and  the  degrees  of  freedom  for  a  particular 
frame,  the  threshold  is  found  for  the  given  confidence  level.  The  confidence  level 
generally  taken  is  in  the  range  of  0.90  to  0.95.  For  the  current  results,  a  confidence  level 
of  0.95  is  used. 

5.4,3.  Performance  Evaluation,  After  calculating  the  threshold,  one  of  the 
following  two  cases  arise: 

Case  -  1  (Test  Statistic  <  Threshold):  In  this  case  one  says  that  the  frame  passes  the  test. 
For  a  test  at  a  confidence  level  of  0.95  it  can  be  said  with  95%  confidence  level  that  the 
background  represented  by  the  frame  can  be  modeled  by  the  given  distribution. 

Case  -  2  (Test  Statistic  >=  Threshold):  In  this  case  one  says  that  the  frame  fails  the  test. 
For  a  test  at  a  confidence  level  of  0.95  it  can  be  said  with  95%  confidence  level  that  the 
background  represented  by  the  frame  cannot  be  modeled  by  the  given  distribution. 

Once  the  decision  has  been  made  for  the  frame,  the  test  is  performed  on  all  the 
frames  of  the  given  dataset  and  the  percentage  of  frames  that  have  passed  the  test  is 
found.  Thus  on  an  average  it  is  expected  that  95%  of  the  frames  will  pass  the  test  and  5% 
would  fail.  If  the  fail  percentage  is  much  higher  than  5%  one  can  say  that  the  modeling  is 
bad  otherwise  the  modeling  is  good.  It  is  clear  from  the  above  procedure  that  the  pass 
percentage  is  a  good  performance  measure  to  judge  the  modeling  ability  of  the  given 
distribution  to  model  the  background  represented  by  the  frames. 
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5.5  THE  KULLBACK-LEIBLER  (KL)  DIVERGENCE 

The  KL  divergence  [30]  is  a  way  to  measure  the  closeness  between  the  two  pdfs.  It 
can  also  be  seen  as  a  measure  of  relative  entropy  between  two  pdfs  [31].  Generally  the 
KL  divergence  is  used  to  measure  the  closeness  between  the  actual  distribution  and  the 
modeled  distribution.  The  closer  the  pdfs  are  to  each  other,  the  smaller  is  the  divergence. 
The  KL  divergence  between  two  pdfs  p{x)  and  q{x)  is  given  as: 


D{p{x),q{x)) 
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(5.14) 


Clearly  if  the  pdfs  are  equal,  then  the  logarithmic  terms  become  zero  and  the 
divergence  between  them  is  zero.  Also,  the  divergence  between  the  two  pdfs  is  always 
non-negative.  In  all  the  modeling  results  that  are  shown,  the  divergence  between  the 
modeling  distribution  and  the  sample  histogram  is  given  in  the  legend  of  the  figure.  In  the 
legend,  the  divergence  value  is  given  in  brackets,  next  to  the  modeling  distribution. 


5.6.  REMOVAL  OF  BAD  DATA 

Certain  frames  in  the  data  don’t  represent  the  background.  Therefore,  these  frames 
are  not  considered  for  the  test.  Some  of  these  frames  are  shown  in  Figure  5.1. 


Y1 G01 P3LIN274  to  Y1 G01 P3UN276 


(a)  Frame  Nos.  274-276 
Figure  5.1.  Frames  Showing  Data 


Y1 G01 P3UN285  to  Y1 G01 P3UN287 


(b)  Frame  Nos.  285-287 
Representing  the  Background. 
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5.7.  PARAMETER  ESTIMATION  USING  LEAST  SQUARES 

The  results  of  EM  estimation  are  eompared  to  another  method  of  estimating  the 
parameters  of  a  given  distribution.  This  method  is  the  method  of  Least  Squares  (LS)  as 
diseussed  in  [23],  where  it  is  used  for  estimating  the  parameters  of  the  modified  three 
parameter  Beta  distribution.  In  this  method,  to  estimate  the  values  of  the  parameters  of 
the  modified  three  parameter  Beta  distribution,  an  arbitrary  sample  spaee  for  the  set  of 
parameters  {/,?],  A,  N)  is  assumed.  This  sample  spaee  forms  the  initial  seareh  spaee  and  a 
brute  foree  seareh  is  done  aeross  the  sample  spaee,  to  identify  a  set  of  values  for  /,rj,A 
and  N  that  is  at  a  minimum  distanee  from  the  measured  pdf  In  order  to  do  this,  for  every 
set  of  parameters,  the  error  between  modeled  pdf  and  the  observed  pdf  is  ealeulated  by 
taking  the  Euelidean  distanee  between  the  two  pdfs.  Amongst  the  error  values  obtained, 
the  minimum  error  is  determined  and  the  set  of  parameters  {/,7J,A,N)  eorresponding  to 
this  minimum  error  is  taken  as  the  first  estimate  of  the  parameters.  After  this  eoarse 
estimate  of  the  parameters  is  obtained,  a  finer  resolution  sample  spaee  around  the  eoarse 
estimates  is  assumed.  The  method  of  least  squares  is  reiterated.  This  seeond  iteration 
results  in  a  finer  resolution  on  the  estimated  parameters. 

In  the  next  seetion,  the  modeling  results  of  the  parameter  estimation  teehniques 
using  the  EM  algorithm  and  the  method  of  the  Least  Squares  are  diseussed. 


5.8.  MODELING  RESULTS 

This  seetion  ineludes  a  visual  inspeetion  to  get  an  idea  of  the  fit  of  the  distributions 
with  the  data.  A  more  rigorous  quantitative  analysis  has  been  done  after  that  when  the 
results  for  the  Chi-Square  test  have  been  reported  on  the  given  data.  As  mentioned 
before,  for  evaluating  the  performanee  for  frame  ‘A:’,  the  samples  from  the  three  frames, 
"k-lA  ‘A:’  and  ‘k+T,  are  eonsidered.  Therefore,  while  displaying  the  modeling  results,  the 
images  have  been  registered  to  show  all  three  eonseeutive  frames. 

Eigure  5.2  shows  a  sample  modeling  result  for  the  Beta  and  Gamma  distribution.  It 
shows  the  pdf  of  the  two  eonstituent  elasses  along  with  the  pdf  of  their  sum.  It  is  to  be 
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noted  that  the  Gamma  distribution  is  plotted  on  the  ‘  r  ’  domain  whereas  the  Beta 
distribution  is  plotted  on  the  ‘  x  ’  domain.  In  part  (a)  of  the  figure,  it  can  be  seen  that  there 
are  two  prominent  regions  in  this  image.  One  region  is  the  background  where  as  the  other 
region  is  some  vegetation  like  trees.  The  background  containing  the  two  regions  has  been 
modeled  by  the  two  classes  as  shown  in  parts  (b)  and  (c). 


Y1G01P3LIN099  to  Y1G01P3LIN101 


(a)  Registered  Image — May  2003  Data  (Nighttime) 


Background  Modeling  -  RX  (Y1G01P3LIN100) 


Background  Modeling  ■  RX  (Y1G01P3LIN100) 


(b)  Gamma  2-Parameters  (c)  Beta  2-Parameters 

Figure  5.2.  Background  Modeling  Showing  the  Mixture  of  Two  Classes. 
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Figures  5.3  (a)  and  5.4  (a)  show  the  registered  image  of  the  frames  under 
eonsideration  for  May  2003  nighttime  data.  The  dataset  name  for  the  result  is  mentioned 
at  the  top  of  every  figure.  In  all  the  figures  showing  the  modeling  results  of  both  Beta  and 
Gamma  distributions,  the  histogram  are  drawn  in  eommon  ‘  x  ’  domain  ( x  e  [0,1] )  for 
easy  eomparison.  Part  (b)  of  Figures  5.3  and  5.4  shows  the  lit  of  modified  three 
parameter  Beta  distribution  (Estimated  using  Least  Squares)  and  the  two  parameter 
Gamma  distribution  (Estimated  using  EM),  with  the  histogram  of  the  samples.  Part  (e)  of 
Eigures  5.3  and  5.4  shows  the  fit  of  the  two  parameter  Beta,  three  parameter  Beta  and  the 
modified  three  parameter  Beta  distributions  (all  estimated  using  EM),  with  the  histogram 
of  the  samples. 

The  pass  pereentage  of  various  models  for  RX  samples  for  the  dataset, 
Y1G01P3LIN,  is  reported  in  Table  5.1.  It  ean  be  elearly  seen  that  the  performanee  of  the 
two  parameter  Gamma  and  Beta  mixture  models  is  the  best  at  96.06%  and  95.14% 
respeetively.  This  is  followed  by  the  performanee  of  the  modified  three  parameter  Beta 
model  obtained  using  least-squares  that  has  a  pass  pereentage  of  94.68%.  It  ean  be  noted 
that  these  pereentages  are  quite  elose  to  95%  as  was  expeeted  at  a  eonfidenee  level  of 
95%. 


As  mentioned  in  Seetion  2.7,  as  the  number  of  parameters  to  be  estimated 
inereases,  the  log-likelihood  surfaee  is  no  longer  steep.  The  surfaee  beeomes  flat,  and  all 
the  parameters  do  not  eonverge  to  a  steady  value.  In  faet,  the  parameters  tend  to  eonverge 
to  a  range  of  values.  Beeause  the  log-likelihood  surfaee  is  flat,  the  eonvergenee  is  not 
very  stable,  and  the  performanee  deteriorates  as  the  number  of  parameters  inerease.  This 
is  the  main  reason  for  the  poor  performanee  of  the  modified  three  parameter  Beta  model 
obtained  using  EM. 
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Y1G01P3LIN039  to  Y1G01P3LIN041 


(a)  Background  Only  Registered  Image — May  2003  Data  (Nighttime) 


Background  Modeling  ■  RX  (Y1G01P3LIN040)  Background  Modeling  ■  RX  (Y1G01P3LIN040) 


(b)  Least-Squares  and  Gamma  2-Parameters  (c)  Beta  2,  3  and  Modified  3-Parameters 
Figure  5.3.  Background  Modeling  for  Different  Distributions — ^Nighttime. 
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Y1G01P3LIN137  to  Y1G01P3LIN139 


(a)  Registered  Image  With  Mines — ^May  2003  Data  (Nighttime) 


Background  Modeling  ■  RX  (Y1G01P3LIN138)  Background  Modeling  ■  RX  (Y1G01P3LIN138) 


(b)  Least-Squares  and  Gamma  2-Parameters  (c)  Beta  2,  3  and  Modified  3-Parameters 
Figure  5.4.  Background  Modeling  With  Mines  for  Different  Distributions — ^Nighttime. 
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Table  5.1.  Pass  Percentages  of  Various  Mixture  Models. 


Modeling 

Distribution 

Pass  Percentages  (%)  for 

95%  confidence  level  for 

RX 

Gamma  2-Parameters 

96.06 

Beta  2-Parameters 

95.14 

Beta  3-Parameters 

90.05 

Modified  3  Parameter  Beta 

68.75 

Modified  3  Parameter  Beta 

(Least  Squares) 

94.68 

After  considering  the  performances  of  different  models,  the  further  analysis  was 
carried  extensively  on  various  datasets  using  the  two  parameter  Beta  and  Gamma 
distributions  only,  because  of  their  superior  performance.  The  analysis  was  done  for  both 
RX  and  Radial  Anomaly  Detector  (RAD)  [35]  samples.  Figures  5.5-5.10  show  the 
modeling  results  for  these  two  distributions  for  the  May  2003  data  (daytime  and 
nighttime)  and  October  2002  data  (daytime).  Part  (a)  of  the  Figures  5.5-5.10  shows  the 
registered  image  of  the  frames.  Part  (b)  of  the  Figures  5.5-5.10  shows  the  modeling 
results  of  the  two  parameter  Gamma  and  Beta  distributions  for  RX  samples.  Part  (c)  of 
the  Figures  5.5-5.10  shows  the  results  of  the  same  distributions  for  RAD  samples. 

It  can  be  seen  that  the  histogram  of  the  RAD  samples  is  noisier  than  that  of  the  RX 
samples.  Part  of  the  reason  is  that  in  case  of  RX,  the  samples  are  distributed  in  a  range  of 
0-0.2,  whereas  in  case  of  RAD,  the  samples  are  distributed  in  a  wider  range  of  0-0.6, 
which  makes  the  observed  histogram  for  RAD  seem  noisier. 


PDF  of  X- 


63 


Background  Modeling  -  RX  (Y1F01P2LFN135)  Background  Modeling  -  RAD  (Y1F01P2LFN135) 


(b)  RX  (c)  RAD 

Figure  5.5.  Background  Modeling  for  May  2003  Data  for  Daytime. 
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Y1H01P2Lmi14  to  Y1H01P2LIN116 


(a)  Registered  Image — May  2003  Data  (Daytime) 


Background  Modeling  ■  RX  (Y1H01P2LIN115)  Background  Modeling  ■  RAD  (Y1H01P2LIN11S) 


(b)  RX  (e)  RAD 

Figure  5.6.  Baekground  Modeling  for  May  2003  Data  for  Daytime. 
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Y1G01P3LJN143  to  Y1G01P3LJN145 


(a)  Registered  Image — May  2003  Data  (Nighttime) 


Background  Modaling  -  RX  (Y1G01P3LJN144)  Background  Modeling  -  RAD  (Y1G01P3LJN144) 


Figure  5.7.  Background  Modeling  for  May  2003  Data  for  Nighttime. 
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Y1I01P3LKN065  to  Y1I01P3LKN067 


(a)  Registered  Image — May  2003  Data  (Nighttime) 


Background  Modeling  -  RX  (Y1I01P3LKN066)  Background  Modeling  -  RAD  (Y1I01P3LKN066) 


Figure  5.8.  Baekground  Modeling  for  May  2003  Data  for  Nighttime. 
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DT17CP3LON009  to  DT17CP3LON011 


(a)  Registered  Image — October  2002  Data 


Background  Modeling  -  RX  (DT17CP3LON010)  Background  Modeling  -  RAD  (DT17CP3LON010) 


(b)  RX  (c)  RAD 

Figure  5.9.  Background  Modeling  for  October  2002  Data. 
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DT17CP3LNN067  to  DT17CP3LNN069 


(a)  Registered  Image — October  2002  Data 


Figure  5.10.  Background  Modeling  for  October  2002  Data. 
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Because  the  histogram  of  the  RX  samples  is  smoother  in  comparison  to  that  of  the 
RAD  samples,  the  average  divergence  between  the  actual  and  modeled  pdf  in  case  of 
RAD  samples  is  more  (~  0.3)  than  the  average  divergence  in  case  of  the  RX  samples  (~ 
0.1).  From  the  results  it  can  be  seen  that  the  performance  of  the  two  models  is  quite 
comparable  for  RX  and  RAD  detection  statistics.  The  modeling  performance  of  the  two 
mixture  models  is  nearly  the  same  for  the  two  types  of  data  in  the  case  of  RX.  However, 
in  the  case  of  RAD,  the  modeling  performance  of  the  Gamma  mixture  model  goes 
slightly  down  for  the  October  2002  data.  This  can  also  be  seen  from  Figure  5.9  (c),  where 
the  modeling  by  the  two  parameter  Gamma  model  is  not  good  for  RAD  samples.  Tables 
5. 2-5. 7  show  the  pass  percentage  of  these  distributions  for  RX  and  RAD  samples. 

Table  5.2.  Pass  Percentages  for  RX — May  2003  Daytime. 


May  2003 

Pass  Percentages  (%)  for  95%  confidence  level,  RX 

(Day  Time) 

Beta  2-Parameters 

Gamma  2-Parameters 

Y1F01P2LFN 

95.36 

93.82 

Y1F01P2LDN 

90.53 

94.71 

Y1F01P2LEN 

92.53 

93.41 

Y1H01P2LIN 

90.64 

93.15 

Table  5.3.  Pass  Percentages  for  RX 

— May  2003  Nighttime. 

May  2003 

Pass  Percentages  (%)  for  95%  confidence  level,  RX 

(Night  Time) 

Beta  2-Parameters 

Gamma  2-Parameters 

Y1I01P3LKN 

94.48 

94.70 

Y1G01P3LIN 

94.44 

94.21 

Y1G01P3LJN 

92.66 

94.66 

Y1G01P3LKN 

94.97 

91.10 
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Table  5.4.  Pass  Percentages  for  RX — October  2002. 


Oct  2002 

Pass  Percentages  (%)  for  95%  confidence  level,  RX 

Beta  2-Parameters 

Gamma  2-Parameters 

DT17BP2LEN 

95.10 

97.20 

DT17CP3LNN 

99.30 

100.00 

DT17CP3LON 

95.80 

98.60 

DT17AP1LAN 

88.81 

93.00 

Table  5.5.  Pass  Percentages  for  RAD — May  2003  Daytime. 


May  2003 

Pass  Percentages  (%)  for  95%  confidence  level,  RAD 

(Day  Time) 

Beta  2-Parameters 

Gamma  2-Parameters 

Y1F01P2LFN 

99.34 

98.89 

Y1F01P2LDN 

98.89 

95.60 

Y1F01P2LEN 

98.68 

97.80 

Y1H01P2LIN 

99.54 

98.40 

Table  5.6.  Pass  Percentages  for  RAD — May  2003  Nighttime. 


May  2003 

Pass  Percentages  (%)  for  95%  confidence  level,  RAD 

(Night  Time) 

Beta  2-Parameters 

Gamma  2-Parameters 

Y1I01P3LKN 

98.23 

98.23 

Y1G01P3LIN 

96.53 

97.90 

Y1G01P3LJN 

94.00 

97.33 

Y1G01P3LKN 

94.97 

98.63 
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Table  5.7.  Pass  Percentages  for  RAD — October  2002. 


Oct  2002 

Pass  Percentages  (%)  for  95%  confidence  level,  RAD 

Beta  2-Paranieters 

Gamma  2-Parameters 

DT17BP2LEN 

99.30 

94.40 

DT17CP3LNN 

100.00 

92.28 

DT17CP3LON 

97.20 

85.31 

DT17AP1LAN 

100.00 

89.44 

It  can  be  recalled  from  section  5.4.3  that  for  good  modeling  it  is  expected  that  on 
an  average  95%  of  the  frames  should  pass  the  test.  From  the  results  in  Tables  5. 2-5. 7,  it 
can  be  observed  that  the  pass  percentages  of  the  two  parameter  Beta  and  Gamma  model 
are  close  to  95%.  Therefore  the  modeling  by  the  above  models  can  be  considered  to  be 
good.  Also  by  observing  the  overall  performance  across  the  samples  and  the  data,  it  is 
evident  that  the  performance  of  the  Beta  mixture  model  is  comparable  to  that  of  the 
Gamma  mixture  model  for  the  May  2003  data.  However  for  the  October  2002  data,  the 
performance  of  the  Beta  model  is  better  than  that  of  the  Gamma  model  for  RAD  samples. 

On  observing  the  October  2002  data,  it  can  be  seen  that  many  frames  have  images 
of  vegetation  such  as  trees  and  bushes.  In  fact,  a  good  portion  of  the  dataset  is  covered  by 
frames  having  images  of  trees.  Thus,  these  frames  don’t  represent  a  homogeneous 
background.  This,  along  with  the  fact  that  the  RAD  samples  are  quite  noisy,  brings  down 
the  performance  of  the  Gamma  model  for  this  data.  The  fit  of  the  Beta  model  is  good  in 
spite  of  the  non-homogeneity  of  the  data.  This  shows  that  the  Beta  model  is  robust  in 
modeling  these  non-homogeneous  areas. 


5.9.  FRAME  CHARACTERIZATION 

This  section  shows  how  to  characterize  a  frame  based  on  the  mixing  proportions  of 
the  classes.  This  gives  some  information  about  the  homogeneity  of  the  data.  The 
characterization  based  on  the  detection  statistic  is  important  since  it  represents  the  spatial 
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correlations  and  the  loeal  non-homogeneities  in  the  data.  Therefore  the  deteetion  statistie 
represents  the  true  nature  of  the  baekground.  After  investigating  the  elass  eoeffieients  of 
the  frames,  Table  5.8  is  obtained.  The  data  has  been  modeled  by  two  elasses  by  the 
Gamma  mixture  model  with  two  parameters. 


Table  5.8.  Frame  Distribution. 


Distribution  of  the  Coefficient  of  the  Major  Class  Across  Frames 

Modeled  by  Gamma  2-Parameters  using  Two  Classes  (for  RX) 

Run  Names 

Coefficient 

DT17CP2LMN 

Y1F01P2LFN 

Y1G01P3LIN 

range  (Major 

(Total 

(Total 

(Total 

Class) 

Frames/Frames 

Frames/Frames 

Frames/Frames 

Failed) 

Failed) 

Failed) 

0.98  to  1.00 

68/6 

139/18 

5/1 

0.95  to  0.98 

12/1 

21/2 

7/0 

0.90  to  0.95 

9/0 

16/1 

35/2 

0.80  to  0.90 

12/0 

20/0 

147/8 

0.70  to  0.80 

21/0 

48/1 

148/8 

0.60  to  0.70 

12/0 

79/0 

62/5 

0.50  to  0.60 

9/0 

130/6 

28/1 

Total 

143/7 

453/28 

432/25 

Based  on  the  Table  5.8,  three  main  eategories  are  eonsidered.  They  are  as  follows: 

Categorv-1  (1  >  Major  elass  eoeffieient  >  0.98):  This  eategory  eorresponds  to  the  frames 
that  essentially  eomprise  the  baekground.  The  baekground  represented  by  these  frames 
ean  be  ealled  benign  in  eomparison  to  the  baekground  represented  by  the  frames  from  the 
other  eategories.  In  the  ease  of  the  dataset,  Y1G01P3LIN,  most  of  the  frames  represent  a 
baekground  eonsisting  of  some  type  of  vegetation,  i.e.  grass,  bushes  and  trees.  Due  to 
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this,  a  majority  of  frames  have  a  major  eoeffieient  in  the  range  of  0.7  to  0.9.  In  the  ease  of 
the  datasets  DT17CP2LMN  and  Y1F01P2LFN,  the  frames  mainly  represent  the 
baekground  alone.  Therefore,  most  of  the  frames  belong  to  this  eategory. 

Categorv-2  (0.98  >  Maior  elass  eoeffieient  >  0.70)i  Generally,  these  are  the  frames  that 
eonsist  of  some  elusters  of  features  that  are  essentially  not  the  part  of  the  baekground.  For 
the  dataset  Y1G01P3LIN,  the  data  show  that  most  of  the  frames  eontain  a  baekground 
that  has  mixing  eoeffieients  that  belong  to  this  eategory.  However  in  ease  of  the  other  two 
datasets  not  many  frames  belong  to  this  eategory. 

Categorv-3  (0.70  >  Maior  elass  eoeffieient  >  0.50)i  These  are  the  frames  that  are  highly 
non-homogeneous.  In  these  frames,  the  baekground  is  mainly  eomprised  of  elusters  of 
features  that  are  not  the  part  of  the  baekground. 

Some  representative  frames  from  these  three  eategories  are  shown  in  Figure  5.11. 
Parts  (a),  (e)  and  (e)  of  the  figure  show  the  registered  frames.  Parts  (b),  (d)  and  (f)  show 
the  modeling  using  two  parameter  Gamma  mixture  model.  The  modeling  is  done  using 
two  elasses.  The  pdf  of  the  eonstituent  elasses  along  with  their  sum  is  also  shown.  The 
major  and  minor  elasses  have  been  shown  in  the  figure.  The  sum  of  the  pdfs  of  the 
elasses  is  used  to  model  the  histogram  of  the  samples. 

It  ean  be  elearly  seen  that  for  the  seeond  and  third  eategory,  the  minor  elass  is  quite 
prominent.  For  the  first  eategory  the  minority  elass  has  a  very  insignifieant  eontribution 
to  the  overall  fit. 
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Y1G01P3LIN195  to  Y1G01P3LIN197 


(a)  Coefficient  of  Major  Class  =  0.99 


Background  Modeling  ■  RX  (V1G01P3LIN196) 


Y1G01P3L1N008  to  Y1G01P3LIN010 


(c)  Coefficient  of  Major  Class  =  0.82 


Background  Modeling  ■  RX  (Y1G01P3LINQ09) 


(d)  Modeling 


Y1G01P3LIN353  to  Y1G01P3LIN355 


Background  Modeling  ■  RX  (Y1G01P3LIN354) 


(e)  Coefficient  of  Major  Class  =  0.58 
Figure  5.11.  Background  Modeling  for  Different  Categories  Using  Gamma  Distribution. 
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5.10.  CONCLUSIONS 

This  section  has  shown  the  application  of  the  EM  algorithm  in  modeling  of  the  RX 
and  RAD  statistics.  The  modeling  performance  of  the  various  mixture  models  was  also 
evaluated,  using  the  Chi-Square  test.  It  can  be  clearly  seen  that  the  two  parameter  Beta 
and  Gamma  mixture  models  have  the  best  performance.  A  brief  characterization  of  the 
background  based  on  the  coefficient  of  the  majority  class  was  also  done. 
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6.  THRESHOLD  SELECTION 


6.1.  OVERVIEW 

In  ATR  systems  [7],  often  there  is  a  front-end  deteetion  stage  or  anomaly  deteetion 
stage.  This  is  also  known  as  the  presereening  stage.  The  aim  of  this  stage  is  to  seleet  the 
targets  at  a  given  False  Alarm  Rate  (FAR)  and  then  pass  these  targets  to  the  next  stage. 
The  threshold  value  is  the  minimum  value  of  the  deteetion  statistie  at  whieh  a  partieular 
false  alarm  rate  is  aehieved.  In  order  to  deteet  a  high  pereentage  of  targets,  the 
presereening  stage  often  uses  a  low  threshold  that  may  result  in  a  large  number  of  false 
alarms.  A  large  number  of  false  alarms  inerease  the  burden  on  the  later  stages  and 
therefore  the  presereening  stage  should  be  designed  in  sueh  a  way  so  that  it  reduees  the 
number  of  false  alarms  while  maintaining  a  high  probability  of  deteetion. 

In  many  oases,  the  prooess  of  threshold  seleotion  simply  involves  seleoting  the 
highest  'T  targets  based  on  their  deteetion  statistie.  This  method  of  pre-speoifying  the 
number  of  targets  per  frame,  though  simple  in  a  functional  sense,  fails  to  provide  any 
insight  into  the  nature  of  the  background  or  in  providing  the  mechanism  of  comparing 
thresholds  across  different  datasets  and  anomaly  detector  algorithms.  This  is  basically 
because  the  detection  statistics  by  themselves  are  affected  by  the  spatial  correlations  and 
local  non-homogeneity  in  the  image.  Also,  threshold  selection  based  on  desired  number 
of  targets  per  frame  does  not  statistically  quantify  the  true  anomalies.  Therefore,  the  need 
for  a  more  sophisticated  threshold  selection  scheme  is  evident. 

This  section  addresses  the  application  of  background  modeling  in  the  adaptive 
Constant  False  Alarm  Rate  (CFAR)  threshold  selection.  Here  the  distribution  of  the  RX 
statistics  of  the  background  is  modeled  using  the  two  parameter  Beta  and  Gamma  model 
along  with  the  modified  two  parameter  Beta  model  discussed  in  the  next  section.  The 
threshold  that  confirms  to  the  desired  FAR  is  then  selected  by  inverse  mapping  of  the 
Cumulative  Distribution  Function  (CDF)  of  this  model.  The  mapping  onto  a  probabilistic 
model  helps  select  a  threshold  that  is  invariant  to  the  background  that  the  detector  is 
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operating  in.  This  mapping  onto  a  distribution  also  helps  map  the  anomaly  deteetion 
statisties  aeross  different  algorithms  onto  a  eommon  distribution,  thus  faeilitating 
eomparison  among  their  statisties  for  sensor  fusion  and  algorithm  level  fusion.  This 
threshold  whieh  depends  on  the  speeified  CFAR  value,  determines  the  number  of 
seleeted  deteetions  above  the  threshold.  This  number  of  deteetions  for  baekground  frames 
gives  a  good  evaluation  of  the  modeling  eapability  of  the  distribution  used  for 
baekground  modeling. 


6.2,  THE  MODIFIED  TWO  PARAMETER  BETA  MODEL 

From  the  diseussion  of  Seetion  5.2  and  5.3,  it  is  known  that  if  f{x  \  Ho)  is  the 
probability  distribution  of  the  RX  statisties  under  null  hypothesis,  then  after  the  non-max 
suppression,  the  probability  distribution  of  the  loeally  maximum  deteetion  statisties  under 
null  hypothesis  beeomes: 


fix  I  g(/)  =  1)  = 


/(x|/7o).e 


V(l-f(x)) 


|/(x  I 
0 


(6.1) 


For  two  parameter  Beta  model,  the  distribution  under  null  hypothesis  is  given  by: 

f2(x\Ho)=^ir,^),  (6.2) 


where. 


B{x:r,ri)  = 


nrWin) 


y  >  0,  r/  >0  and  0  <  x  <  1 . 


(6.3) 


Then  the  modified  two  parameter  Beta  distribution  is  given  by: 
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=  .  (6.4) 

where  F2(x)  is  the  eumulative  distribution  funetion  of /j  (x) ,  the  generalized  two 
parameter  Beta  distribution,  ‘if’  is  a  normalizing  eonstant  and  is  given  by: 

K  =  (6.5) 

0 

Parametrie  estimation  of  this  modified  two  parameter  Beta  model  f^{x)  involves 

obtaining  estimates  ‘  y° ‘  77°  ’  and  ‘  A^°  ’  of  the  three  parameters  ‘  y ‘  7  ’  and  "N’  for  the 
input  image.  The  parameters  are  estimated  using  the  EM  algorithm.  As  mentioned  in  the 
previous  seetion,  to  ensure  a  good  number  of  sample  points  for  modeling,  the  targets 
from  the  eurrent  frame  and  ±  1  frame  about  it  are  used  for  modeling  the  eurrent  frame.  A 
typieal  frame  eonsists  of  about  380  targets,  and  thus  about  1140  (380x3)  targets  are  used 
to  model  every  frame.  This  distribution  is  eompared  with  the  two  parameter  Beta  and 
Gamma  models  for  eomparison.  The  update  equations  for  the  modified  two  parameter 
Beta  distribution  are  given  in  appendix — ^A.1.4. 


6.3.  THRESHOLD  CALCULATION  FOR  A  GIVEN  CFAR 

In  this  seetion,  the  working  proeedure  for  ealeulating  the  threshold  from  the  model 
is  diseussed. 

Let  ‘Nr’  and  ‘Nc’  be  the  number  of  rows  and  columns  in  the  image.  The  GSD  is  the 
Ground  Sampling  Distance  for  the  airborne  imagery  in  inches.  If  there  are  W/  targets  per 
image  segment  (frame),  then  the  probability  of  false  alarm,  ‘P/  is  given  as: 

^  _  {CFAR).A 


N, 


(6.6) 
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2 

where  ‘A  ’  of  the  image  segment  represents  the  area  in  meter  and  is  given  as: 

A  =  (Nr.Nc)(GSDx0.0254f  (6.7) 

Thus  onee  the  CFAR  is  determined,  the  ‘P/  gets  fixed.  Corresponding  to  this  'Pf,  ’ 
the  inverse  mapping  of  the  CDF,  that  would  give  the  threshold,  ean  be  found  as  follows. 

Let  F(x)  be  the  CDF  of  the  modeling  distribution  / (x)  ,  and  ‘P/  be  the  probability 
of  false  alarms  obtained  from  the  Equation  6.6,  then  the  threshold,  ‘7)’  ean  be  obtained 
as: 


T  =  arg[F(x)  =  (l  -  P, )]  =  F"' (l  -  F, ).  (6.8) 

Here,  / (x)  ean  be  any  modeling  distribution.  In  this  thesis,  two  parameter  Beta,  two 
parameter  Gamma  and  modified  two  parameter  Beta  distributions  have  been  used  for 
determining  the  threshold  by  inverse  mapping  of  the  CDF.  In  ease  of  the  Beta 

T 

distribution,  it  is  neeessary  to  apply  the  reverse  transformation  [i.e.  to  get  the 

eorreet  threshold  for  the  RX  deteetion  statistie.  This  transformation  is  not  required  for  the 
Gamma  distribution  that  is  represented  by  the  RX  statistie  ‘r.’  Please  see  Seetion  5.2  for 
the  diseussion  of  deteetion  statistie,  ‘r’  and  ‘  x .’ 

6.4,  PERFORMANCE  EVALUATION 

The  Chi-Square  test  has  been  used  to  evaluate  the  performanee  of  the  distributions 
for  the  threshold  analysis.  The  Chi-Square  test  employed  for  the  threshold  analysis  is  a 
little  different  from  the  one  used  for  obtaining  the  results  in  the  previous  seetion  (Seetion 
5.4.2).  The  low  value  of  CFAR  (10"'  to  10'^)  eorresponds  to  the  tail  of  the  pdf  For  the 
threshold  analysis,  only  the  targets  that  eorrespond  to  a  CFAR  of  0.1  or  lesser  are 
eonsidered,  sinee  the  threshold  analysis  is  performed  in  this  low  CFAR  region. 
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This  Chi-Square  test  was  performed  at  the  eonfidenee  level  of  95%  for  the  three 
distributions  shown  in  the  Table.  The  results  were  as  shown  in  Table  6.1. 


Table  6.1.  Pass  Pereentages  for  Threshold  Analysis  for  CFAR<  0.1 


Distribution 

Pass  Percentages  (%)  for  RX  samples 

Gamma  2-Parameters 

28.00 

Beta  2-Parameters 

19.44 

Modified  Beta  2-Parameters 

67.36 

The  results  show  that  the  performanee  of  the  modified  two  parameter  Beta  model  is 
superior  to  the  other  two  models  in  the  tail  region.  This  is  basically  because  of  the 
introduction  of  the  parameter  W.’  However  the  pass  percentage  is  still  much  lower  than 
95%  which  would  be  expected  for  a  95%  confidence  level.  The  reason  for  this  is  the  fact 
that  the  number  of  targets  at  the  CFAR  of  0.1  FA/m  is  quite  small  and  the  observation  is 
noisy.  Better  results  may  be  expected  if  a  bigger  area  is  considered  for  background 
modeling  as  would  be  available  from  newer  stair-step  collection. 


6.5.  SELECTION  OF  NUMBER  OF  CLASSES 

The  following  approach  has  been  adopted  for  the  selection  of  the  number  of 
classes.  The  modeling  is  started  with  one  class  and  the  Chi-Square  test  is  performed  on 
the  model.  If  the  test  passes,  the  next  frame  is  modeled  otherwise  the  modeling  is 
repeated  for  the  current  frame  with  two  classes.  After  modeling  with  two  classes,  the 
parameters  are  stored  along  with  the  number  of  classes. 
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6.6.  RESULTS— MODELING 

This  section  shows  the  modeling  of  the  detection  statistics.  In  Figures  6.1  and  6.2, 
the  histogram  of  the  samples  is  shown  in  red  color.  The  samples  are  the  values  of  the 
detection  statistic,  ‘  x Please  see  Section  5.2  for  the  discussion  of  the  detection  statistic 
‘x.’  Figure  6.1  shows  the  fit  of  the  three  distributions  discussed.  Figure  6.2  shows  a 
closer  look  at  the  tail  region  of  the  pdf  in  Figure  6.1.  The  model  that  models  the  tail  best 
would  perform  the  best  for  the  adaptive  CFAR  threshold  selection  in  the  low  CFAR 
region.  In  Figure  6.1,  parts  (a)  and  (c)  show  the  registered  image  from  the  May  2003  data 
whereas  parts  (b)  and  (d)  show  the  modeling  using  two  parameter  Beta,  two  parameter 
Gamma  and  the  modified  two  parameter  Beta  distributions. 


From  the  figures  showing  the  modeling  results,  it  is  very  clear  that  the  modeling 
performance  of  the  modified  two  parameter  Beta  distribution  is  superior  to  that  of  the  two 
parameter  Beta  and  Gamma  distribution  in  the  tail  region.  This  is  mainly  because  of  the 
introduction  of  the  parameter  W,’  that  needs  to  be  considered  because  of  the  modification 
due  to  the  non-max  suppression.  The  modified  two  parameter  Beta  distribution  takes  the 
parameter  ‘N’  into  consideration  whereas  the  two  parameter  Beta  and  Gamma 
distributions  do  not.  Please  see  Section  5.3  for  the  discussion  on  the  parameter  W.’  This 
is  also  the  reason  for  poor  performance  of  the  two  parameter  Beta  and  Gamma  models  in 
table  6.1. 

Note  that  a  low  CFAR  in  the  range  of  10'  to  10'  is  of  interest  and  it  corresponds  to 
the  tail  region  of  the  pdf.  Because  the  modeling  performance  of  the  modified  two 
parameter  Beta  model  is  superior  in  the  low  CFAR  region,  this  model  is  best  for  the 
adaptive  CFAR  threshold  selection  analysis.  This  is  also  shown  by  the  results  of  the  next 
section,  where  the  results  using  the  modified  two  parameter  Beta  model  agree  well  with 
the  predicted  results. 
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Y1G01P3LIN152  to  Y1G01P3LIN154 


Background  Modeling  -  RX  (Y1G01P3LIN1S3) 


(a)  Registered  Image  with  Mines 


(b)  Modeling 


Y1G01P3LIN249  to  Y1G01P3LIN251 


Background  Modeling  -  RX  (Y1G01P3LIN2S0) 


(c)  Background  Only  Registered  Image  (d)  Modeling 

Figure  6.1.  Modeling  of  Detection  Statistic  By  Different  Distributions — ^May  2003  data. 
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Background  Modeling  ■  RX  (Y1G01P3LIN1S3) 


(a)  Tail  Region  for  Modeling  in  Figure  6.1  (b) 


X  10 


Background  Modeling  ■  RX  (Y1G01P3LIN2S0) 


- Histogram  of  samples 

— S —  EM  -  Gamma  2  Parameters 

- EM  -  Beta  2  Parameters 

- EM  -  Modified  2  Parameter  Beta 
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>  -h  o 
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X  -> 


0.22 


0.24 
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(b)  Tail  Region  for  Modeling  in  Figure  6.1  (d) 

Figure  6.2.  Expanded  view  of  the  tail  region  of  modeling  in  Figure  6.1 
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6.7.  RESULTS— CFAR  THRESHOLD  SELECTION 

This  section  shows  the  results  of  the  threshold  analysis  that  has  been  performed  for 
the  May  2003  data.  The  result  has  been  calculated  on  the  dataset  YIGOIPSLIN,  and  the 
results  are  shown  for  following  three  values  of  CFAR; 

(1)0.1 
(2)  0.01 
(3)  0.004 

Figures  6. 3-6. 5  (a)  show  the  threshold  obtained  for  a  given  value  of  CFAR.  The 
number  of  targets  that  are  above  this  threshold  are  shown  in  Figures  6. 3-6. 5  (b).  Figures 
6. 3-6. 5  (c)  and  (d)  show  the  histogram  of  the  targets  for  the  two  parameter  Beta  and  the 
modified  two  parameter  Beta  model  respectively. 

The  histogram  of  the  targets  is  a  good  measure  to  judge  the  modeling  performance 
of  the  distributions  for  a  given  CFAR  value.  For  example  in  case  of  the  CFAR  value  of 
0.1,  the  predicted  number  of  false  alarms  per  frame  is  255x0.1  ~  25.  (Flere  255  is  the  area 
per  frame,  M,’  as  shown  in  Equation  6.7)  From  Figure  6.3,  it  can  be  seen  that  the 
histogram  of  the  two  parameter  Beta  model  is  centered  near  40  whereas  the  histogram  of 
the  modified  two  parameter  Beta  model  is  centered  near  25  which  is  very  close  to  the 
expected  value  of  about  25.  This  tells  that  the  performance  of  the  modified  two  parameter 
Beta  model  is  better  than  that  of  the  two  parameter  Beta  model  for  the  CFAR  of  0. 1 . 
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Threshold  (Y1G01P3LIN)  CFAR=0.1 


Targets  above  the  threshold  (YIGOIPSLIN)  CFAR»  0.1 


55 


5  I - ^ ^ - 1 - ^ ^ - \ - ^ ^ - 1 

0  50  100  150  200  250  300  350  400  450 


Frames  -> 


(a)  Threshold  Determination 


(b)  Targets  Above  the  Threshold 


Histogram  of  Targets  (Y1G01P3LIN)  CFAR  -  0.1  Histogram  of  Targets  (Y1GD1P3LiN)  CFAR-  0.1 


(c)  Histogram  (Beta  2-parameters)  (d)  Histogram  (Modified  2  Parameter  Beta) 
Figure  6.3.  Adaptive  Threshold  Determination  for  CFAR  =0.1 


Ideally,  the  RX  threshold  across  a  run  is  expected  to  be  almost  constant,  but  due  to 
the  non-homogeneities  of  the  data,  there  are  significant  variations  in  the  threshold.  This 
can  be  seen  from  the  results.  For  example  it  can  be  seen  that  the  threshold  obtained  for 
the  frame  nos.  160  to  190  is  somewhat  different  from  the  threshold  obtained  for  other 
frames  in  the  dataset.  On  observing  the  dataset,  it  can  be  seen  that  these  frames  represent 
the  regions  that  are  highly  non-homogeneous  as  compared  to  the  regions  represented  by 
the  other  frames  of  the  dataset. 
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Threshold  (YIGOIPSLIN)  CFAR=0.01 


Targets  above  the  threshold  (YIGOIPSLIN)  CFAR»  0.01 

25  I - ^ ^ - 1 - ^ —  I  I  I  1  T 


- Beta  -  2  Parameters 


Gamma  -  2  Parameters 
- Modified  2  Parameter  Beta 


Frames  -> 


450 


(a)  Threshold  Determination 


(b)  Targets  Above  the  Threshold 


Histogram  of  Targets  (Y1G01P3LIN)  CFAR-  0.01  Histogram  of  Targets  (Y1GD1P3LiN)  CFAR-  0.01 


(c)  Histogram  (Beta  2-parameters)  (d)  Histogram  (Modified  2  Parameter  Beta) 
Figure  6.4.  Adaptive  Threshold  Determination  for  CFAR  =  0.01 


It  can  be  seen  from  the  Figures  6. 3-6. 5  (a)  and  (b)  that  as  the  CFAR  value 
decreases,  the  threshold  obtained  increases  and  the  number  of  targets  above  the  threshold 
decreases.  This  is  because  a  low  value  of  CFAR  corresponds  to  the  tail  region  of  the  pdf. 
As  one  approaches  the  tail  region,  the  CDF  value  of  the  distribution  approaches  one. 
Because  the  threshold  is  obtained  from  the  inverse  mapping  of  the  CDF,  the  threshold 
value  increases  as  the  CDF  value  approaches  one. 


87 


Threshold  (YIGOIPSLIN)  CFAR»  0.004 


Targets  above  the  threshold  (YIGOIPSLIN)  CFAR»  0.004 

Beta  -  2  Parameters 
Gamma  -  2  Parameters 
Modified  2  Parameter  Beta 


200  250  300 

Frames  -> 


(a)  Threshold  Determination 


(b)  Targets  Above  the  Threshold 


Histogram  of  Targets  (Y1G01P3LIN)  CFAR-0.0D4  Histogram  of  Targets  (Y1G01P3LiN)  CFAR- 0.004 


(c)  Histogram  (Beta  2-parameters)  (d)  Histogram  (Modified  2  Parameter  Beta) 
Figure  6.5.  Adaptive  Threshold  Determination  for  CFAR  =  0.004 


The  CFAR  of  0.1  corresponds  to  about  25.5  targets  (255x0.1).  The  histogram  of 
the  modified  two  parameter  Beta  model  is  centered  very  close  to  this  value.  The  CFAR  of 
0.01  corresponds  to  about  2.55  targets  (255x0.01).  The  histogram  of  the  modified  two 
parameter  Beta  model  is  centered  at  4  which  is  about  1  target  higher.  Finally,  the  CFAR 
of  0.004  corresponds  to  about  1  target  (255x0.004)  and  the  histogram  of  the  modified  two 
parameter  Beta  model  is  centered  at  2  targets  which  is  again  1  target  higher.  For  the  low 
value  of  CFAR  (10'  to  10'  ),  the  modeling  results  are  away  by  just  1  target.  This  is  fine 
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because  for  such  a  low  value  of  CFAR,  it  is  possible  to  have  1  extra  target  in  the  given 
region.  In  terms  of  the  anomaly  detector,  the  CFAR  is  of  the  order  of  10'  to  10'  .  The 
targets  detected  at  the  threshold  corresponding  to  this  CFAR  would  be  passed  to  a  False 
Alarm  Mitigation  (FAM)  scheme,  the  aim  of  which  is  to  reject  the  false  alarms. 

From  the  above  plots,  it  is  evident  that  the  predicted  number  of  targets 
corresponding  to  a  given  CFAR  agrees  well  with  the  modeling  results  obtained  from  the 
modified  two  parameter  Beta  model.  The  superior  performance  of  this  model  is  because 
of  the  fact  that  it  takes  the  parameter  "IST  into  consideration,  while  the  two  parameter 
Gamma  and  Beta  models  do  not. 

6.8.  CONCLUSIONS 

This  section  discussed  the  need  for  the  adaptive  CFAR  threshold  selection  and 
shows  the  application  of  the  modeling  of  the  detection  statistic  in  performing  adaptive 
CFAR  threshold  selection.  For  the  adaptive  CFAR  threshold  selection,  good  modeling  in 
the  low  CFAR  (10'  to  10'  )  region  is  required.  It  has  been  shown  that  the  modified  two 
parameter  Beta  distribution  has  the  best  performance  in  the  low  CFAR  region  and 
therefore  it  is  best  suited  for  performing  the  adaptive  CFAR  threshold  selection.  Results 
obtained  are  in  good  agreement  with  the  predicted  results. 
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7.  CONCLUSIONS  AND  FUTURE  WORK 

In  this  thesis,  the  EM  algorithm  is  investigated  and  used  for  various  applieations.  In 
Seetions  3  and  4  the  algorithm  is  used  to  model  the  baekground  data  and  segment  the 
image  into  elasses.  The  eoneept  of  elass  membership  is  used  to  implement  the  SEM- 
based  anomaly  deteetors.  The  results  for  segmentation  were  good.  The  SEM-based 
anomaly  deteetors  performed  well  and  in  the  ease  of  buried  mines  their  performanee  was 
slightly  better  than  that  of  RX. 

In  Seetions  5  and  6,  the  EM  algorithm  was  used  to  model  the  deteetion  statistie  of 
the  anomaly  deteetor.  This  modeling  was  then  used  to  perform  the  adaptive  GEAR 
threshold  seleetion  to  determine  the  optimum  number  of  targets  for  the  given  GEAR. 
Here  also  the  modeling  results  were  very  good.  The  results  showed  that  the  deteetion 
statistie  ean  be  very  well  modeled  with  the  two  parameter  Gamma  and  Beta  distributions. 
The  results  of  the  adaptive  GEAR  threshold  seleetion  performed  using  the  modified  two 
parameter  Beta  model  were  in  good  agreement  with  the  expeeted  number  of  targets. 

While  implementing  the  EM  algorithm,  it  has  been  observed  that  sometimes  a  large 
number  of  iterations  are  needed  before  the  algorithm  eonverges  to  a  steady-state  value.  In 
literature,  different  methods  are  mentioned  to  speed  up  the  eonvergenee  of  the  algorithm. 
Some  of  these  are  mentioned  in  [1].  In  future,  these  methods  ean  be  employed  to  speed 
up  the  eonvergenee  of  the  EM  algorithm.  This  would  direetly  reduee  the  amount  of  time 
needed  to  proeess  the  applieations  that  use  the  EM  algorithm  to  model  the  data. 

The  EM-based  segmentation  eould  be  tested  on  multi-band  data.  This  would  help 
segmenting  the  image  based  on  spatial  and  speetral  eorrelation.  Praetieal  issues  sueh  as 
eonsisteney  of  elass  assignment  over  eonseeutive  frames  need  to  be  addressed.  Better 
methods  for  seleeting  the  number  of  elasses  must  be  studied. 
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APPENDIX— A 

DISTRIBUTIONS  AND  UPDATE  EQUATIONS 
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This  appendix  has  two  sections.  The  first  section  presents  the  various  distributions 
and  the  corresponding  update  equations.  The  second  section  explains  how  these  update 
equations  are  used  to  estimate  the  parameters  of  the  mixture  model. 

A.l,  UPDATE  EQUATIONS 

In  this  section,  the  distributions  of  the  various  models  are  given.  The  partial 
derivatives  that  are  used  to  obtain  the  update  equations  are  also  presented  for  the  various 
mixture  models. 

Please  refer  to  the  following  terminology  for  all  the  equations  presented  in  this  section. 

n  =  Number  of  samples 
g  =  Number  of  classes 


(A.l) 

(A.2) 

(A.3) 

(A.4) 

(A.5) 

(A.6) 

(A.7) 
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where  log(x)  denotes  natural  logarithm  and  z.j\s,  the  probability  that  the /*’  (7  =  1,  2... 
n)  sample  arose  from  the  f'  {i  =  1,  2...  g)  class.  For  all  the  mixture  models,  the 
coefficient  of  the  f'  {i  =  1,  2...  g)  class,  ;r, ,  is  given  by: 


TZ:  - 


(A.8) 


Let  the  partial  derivative  of  the  cost  function,  ‘‘Q’ ,  with  respect  to  some  parameter 

do 

‘  (p  ’  (that  is  to  be  estimated  from  the  data)  be  given  by  — ,  then  the  update  equation  for 

d(p 

this  parameter  is  given  by: 


d(p 


where,  ‘£”  is  the  expectation  operator. 

A.  1,1,  Gaussian  Mixture  Model,  N{x 

Here,  ‘  ’  is  the  mean,  ‘  E  ’  is  the  covariance  matrix  and  V’  is  the  dimensionality 

of  the  data,  ‘  x  .’  The  multivariate  Gaussian  distribution  is  given  as: 

/(^)  =  {^27^)“^^  ~  ^  ~ 

The  ML  estimates  of  the  parameters  of  a  Gaussian  mixture  model  are  given  by: 


//  = 


j=\  1=1 


n 


(A.10) 
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n  g 


S  = 


7=1  i=l 


(A.  11) 


A,1.2.  Two  Parameter  Beta  Model,  Beta{x 


/W=  A  (  ,  0<x<\;r,n>0 


B{r,ri) 


(A.12) 


glog[/(^)] 

8/ 


& 

^  z  J'F  (f  + -  ^^(7)  +  log(x)] 


(A.13) 


i=l 


51og[/(^)]  =  +  +  (A.14) 

'  1=1 

A.1,3.  Three  Parameter  Beta  Model,  Beta{x  :  y,ri,X). 

f{x)  =  7  r  ^  n —  ,  0<x<1;/,;7,;L>0  (A.15) 


glog[/(^)]  ^  ^ ^  ^ri)-^>{Y)+  log(;L)  +  log(x) - log(l - (l - X)x)]  (A.  1 6) 


glog[/(^)]  ^  ^  ^  iog(i  _  (1  _  2)x)] 

'  /=1 


(A.17) 


51og[/(^)] 

dx 


\r  (/  +  vy 

^  log(l  -  (1  - /l)x) 


(A.  18) 
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A,l,4.  Modified  Two  Parameter  Beta  Model,  Beta{x :  /,7j,N). 


Let,  f^ix)  = 


0  <  X  >  0 


Then  the  modified  two  parameter  Beta  model  is  given  by: 


=  (N>0) 

iV 


(A.19) 


(A.20) 


where  F2(x)  is  the  eumulative  distribution  funetion  of  /j  (x) . 
K  is  given  by: 


0 


(A.21) 


^iog[/M_ 

dy 


& 

i=\ 


'Fly  + - 'Fly )  +  log(x)  + 


N 


A  + 


B{y,ri)\  "  y{y  +  ri) 


-A 


(A.22) 


^log[/l^)] 

drj 


'Fly  +  ri)-^{ri)  +  logll 


A  + 


rh  I 

ri{y  +  ri)\ 


51og[/l^)] 

dN 


g 


z=l 


'1a  +  1)-1 

KN^ 


(A.23) 

(A.24) 


Here  all  the  integrals  ‘  ’  are  same  as  defined  previously  with  the  parameter  X  =  \ 
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A,l,5.  Modified  Three  Parameter  Beta  Model,  Beta{x  :r,7j,X,N). 


^  (A.25) 

Then  the  modified  three  parameter  Beta  model  is  given  by: 

/(^)  =  ^/3 ( A  >  0 )  (A.26) 


where  F3(x)  is  the  eumulative  distribution  funetion  of  (x)  and  ‘K’  is  given  by: 


K  =  (A.27) 

0 


=  '^^i^{y  +  'n)-^{y)+ iog(;L) + log(x)  -  iog(i  -  (i  -  2.)x)] 


+  Z; 


1  =  1 
NX' 


(A.28) 


51og[/(^)]. 

drj 


+  z, 


^  z  J'FI/  + -  ^^(77)  +  log(l  -  x)  -  log(l  -  (l  -  ;i)x)] 

/A 


Nyj 


B[y,v) 


^3-^4  + 


ri{y  +  ri)\ 


(A.29) 


51og[/(^)] 


dx 


y 


[y  +  7) 


IX  NX'^'  f  r  N  \r 

+— — -^(7 +7)^2 


A  iog(i  -  (1  -  ;i)x)  B{r,?]) 


(A.30) 
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51og[/(x)] 


dN 


g 

=  ^zJ1-F3(x)]  + 


/=1 


r^(iv  +  i)-i 

KN^ 


A,l,6,  Two  Parameter  Gamma  Model,  Gamma{x  :k,X). 


fix)  = 


r{k) 


0<x<oo;k,/l>0 


=  £zJlo8(A)+log(x)- <■(*)] 

/  =  1 

aiog[/(x)]  ^  \^-x 

dA  \A 

1=1 


(A.31) 


(A.32) 


(A.33) 


(A.34) 


A.2,  PARAMETER  ESTIMATION  FROM  THE  UPDATE  EQUATIONS 

In  this  seetion,  the  application  of  the  update  equations  to  estimate  the  set  of 
parameters  of  the  mixture  model  is  shown.  Let  us  consider  the  estimation  problem  of  the 
three  parameter  Beta  distribution,  Beta( x\y,r],A) 

Once  the  update  equations  are  obtained,  the  information  matrix  is  formed.  The 
information  matrix,  ‘7^,’  is  a  square  matrix  with  the  dimension  ^p,'  where  "p'  is  the 
number  of  parameters  to  be  estimated.  In  the  case  of  Beta(x ;  f,7,/l ),  p  =  3.  Each 
diagonal  element  of  the  information  matrix  is  the  derivative  of  the  log-likelihood  of  the 
complete  data  with  respect  to  the  estimated  parameter.  Thus,  in  the  case  of 
Beta(x  ;  /,ri,A),  this  matrix  is  obtained  as  follows: 
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y  plog[/(x)]?  Q  Q 

yL  J 

0  yplog[/(x)]?  Q 

*“  5/7 

y=l  L  /  J 

0  0  yplog[/(x)]? 

^  ai 

j=\  >- 

where  log[/ (x)]  is  the  log-likelihood  of  the  eomplete  data. 

The  value  of  the  off-diagonal  elements  of  the  information  matrix,  gives  the 
dependenee  of  one  parameter  over  the  other  [49].  In  ease  of  the  EM  algorithm,  sinee  the 
parameters  are  often  assumed  to  be  independent,  the  off-diagonal  elements  ean  be  taken 
to  be  zero.  Let  y’'  ,ri^  be  the  estimates  of  the  parameters  in  the  iteration,  then  the 
estimate  of  these  parameters  in  the  (k+lf'  step  is  given  by: 

glog[/(^)] 

dy 

glog[/(^)] 

drj 

glog[/(^)] 

a;i 

This  new  estimate  of  the  parameters  ,  ^ i^+i , , , ^ i+i , , , caleulate  the  log- 
likelihood  funetion  again  in  the  next  iteration.  This  new  log-likelihood  funetion  is  again 
maximized  to  get  the  new  set  of  parameters  ' y'‘^^' in  the  (k+2/^  iteration. 
The  proeess  eontinues  until  the  parameters  eonverge  to  a  steady  state  value.  Please  see 
seetion  2.5  for  further  details  on  the  ealeulation  of  the  log-likelihood  funetion. 
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APPENDIX— B 

TEST  STATISTICS  TO  MEASURE  GOODNESS  OF  FIT 
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This  appendix  discusses  some  of  the  tests  to  measure  goodness  of  fit. 

B.l.  CHI-SQUARE  TEST 

After  the  visual  inspection  of  the  modeling  results,  it  is  worthwhile  to  have  a 
quantitative  analysis  of  the  performance  of  the  different  models.  One  such  comprehensive 
test  is  the  Chi-Square  test.  It  has  the  following  test  statistic  [45]; 


2  ^  Y  {Oi-E)^ 
^  ^  E. 

z=l  _ 


(B.l) 


where, 


=  Test  Statistic 

O.  =  Observed  value  for  i*''  observation 
E.  =  Estimated  value  for  i*'’  observation 


n  =  Number  of  observations 


Sometimes  a  correction  known  as  the  “Yates  correction”  is  also  applied  to  account 
for  the  quantization  error.  The  Yates  correction  is  introduced  as  a  correction  for 
discontinuity.  In  that  case,  the  test  statistic  is  modified  as: 


2  ^  Y  {\0‘-E‘\-cy 

^  ^  Ei 

t=l  _ 


(B.2) 


where  ‘c’  is  the  Yates  correction  and  generally  c  =  0.5 

The  degree  of  freedom  required  to  calculate  the  threshold  is  calculated  as  follows. 
First  the  samples  are  grouped  in  bins  in  such  a  way  that  there  is  at  least  a  certain  number 
of  samples  per  bin.  If  the  number  of  bins  are  ‘  ’  and  there  are  'p'  parameters  that  have 

been  estimated  from  the  data,  then  the  degrees  of  freedom,  ‘  v ,’  is  given  as: 


(B.3) 
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The  hypothesis  Ho  :  X F  is  rejeeted  -  Zi-a 


B,2,  CRAMER  VON-MISES  (CVM)  TEST 

The  test  statistie  for  this  test  is  given  as  [45]; 


CM  =  —  + 
\2n 


n 

I 


i=\ 


/-0.5 


n 


(B.4) 


where, 


n  =  Number  of  observations. 

F(xj.„;6')  =  CDF  of  the  ordered  observations,  given  0. 

6  =  Parameter  vector  of  the  given  distribution. 

Here,  CDF  is  the  Cumulative  Distribution  Function.  An  approximate  size  ‘a’  test 
of  Ho :  X  F  is  to  reject  Ho  if  CM  >  CM /_ «.  The  critical  values,  CM /_ «,  are  given  in  [45] 
for  several  values  of  ‘a’  and  censoring  levels.  If  the  parameters  are  estimated  from  the 
data  then  the  value  of  ‘  6*  ’  is  replaced  by  its  maximum  likelihood  estimate. 

B.3.  KOLMOGOROV-SMIRNOV  (KS)  OR  KUIPER  TEST 

In  order  to  study  this  statistic  let  us  assume  that  [45], 
n  =  Number  of  observations. 

f(xj.„;6')  =  CDF  of  the  ordered  observations,  given  ‘  6* .’ 


9  =  Parameter  vector  of  the  given  distribution. 


max. 


i 

n 


fC\0) 


(B.5) 


(B.6) 


D  =  max 


n 
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D  =  max(z)^,Z)  ) 

(B.7) 

V  =  D^  +D 

(B.8) 

The  statistic  given  by  ‘Z)’  is  known  as  the  “Kolmogorov-Smimov”  or  “KS”  statistic 
and  the  statistic  given  by  ‘F  is  known  as  the  “Kuiper”  statistic.  Here  also,  the  ‘a’  test  of 
//o  .•  F  is  to  reject  Hq  if  KS  >  KSj.  The  critical  values  for  this  statistic  are  given  in 
[45]  for  several  values  of ‘a’  and  censoring  levels. 
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